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Summary 


This  project  has  established  the  foundations  for  developing  a  'Distributed  Middleware- 
based  Architecture  for  Fault-tolerant  Computing  Over  Distributed  Repositories'.  During 
the  project,  we  have  identified  the  sub  problems  and  related  issues,  and  investigated 
creation  of  an  integrated  solution.  This  effort  is  by  no  means  complete.  In  each  section, 
we  have  articulated  the  need  for  extending  proposed  solutions  to  reach  a  practically 
useful  approach/solution. 

Furthermore,  we  have  not  addressed  all  the  problems  identified  in  the  proposed 
architecture.  There  are  a  number  of  important  problems  that  need  to  be  addressed  to 
obtain  a  complete  solution.  We  will  be  happy  to  explore  opportunities  with  AFRL  or 
other  agencies  to  continue  this  work. 

In  this  project,  we  have  implemented  proof-principle  systems  to  show  the  effectiveness 
of  the  proposed  approaches.  The  simulator  has  been  extended  to  incorporate  features 
specific  to  this  scenario  and  analysis  has  been  performed.  The  next  logical  step  is  to 
create  a  prototype  by  bringing  all  the  components  together  and  then  move  towards  a 
testbed.  The  team  has  been  chosen  from  the  outset  keeping  the  long  term  goals  in 
mind  and  team  is  well-positioned  for  additional  work  on  this  project. 


1. 


INTRODUCTION 


The  overall  architecture  includes  a  middleware  in  each  node  that  has  a  number  of 
services  (based  on  SOA)  for  Collecting,  Managing,  Replicating  data  and  Meta  data  for 
the  purposes  of  routing  and  query  processing.  Briefly,  the  following  services  are 
described  in  this  report: 

•  Use  case  scenarios 

•  Meta  data  maintained  at  each  node 

•  Query  plan  generation  algorithms,  heuristics,  and  query  execution 

•  Replica  determination  and  management 

•  Incorporating  the  above  into  the  simulator 

•  Software  developed 

First-effort  solutions  are  described  in  this  report  along  with  a  use  case  that  describes  how  the 
solutions  fit  within  an  operational  scenario.  They  can  be  improved  further  by  using  additional 
characteristics  of  the  problem  under  consideration.  For  example,  currently,  a  static  plan  is 
generated  for  each  query  and  the  query  is  processed  sequentially  by  performing  a  sequence  of 
operations  at  different  nodes.  This  can  be  further  improved  by  generating  a  dynamic  plan  at 
each  node  where  the  query  is  processed  to  accommodate  intermittent  connectivity  among 
nodes.  Furthermore,  a  query  plan  can  be  executed  in  parallel  (using  either  static  or  dynamic 
plans)  to  improve  response  time. 

Currently,  replication  is  assumed  to  be  single  copy  and  complete  for  each  relation.  This  can  be 
extended  to  multi-copy  and  partial  replica  again  to  accommodate  intermittent  connectivity. 
Above-mentioned  extensions  will  add  complexity  to  both  Meta  data  management  and  query 
processing. 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


1 


Figure  1  shows  the  proposed  Service  Oriented  Architecture  (SOA)  for  the  middleware  for 
supporting  query  processing  over  distributed  repositories  and  accommodate  fault  tolerance. 
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Figure  1:  Architecture 

Message  format  and  the  algorithms  presented  in  the  previous  report  are  assumed. 


APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


2 


2. 


METHODS 


The  project  mainly  employed  analysis  and  simulation  during  the  investigations.  Extensions  of 
the  project  would  involve  studies  on  test-bed  and  eventually  deployment  in  real  world 
environments. 

Management  of  metadata  for  the  purpose  of  query  management  under  volatile  network 
conditions  is  discussed  in  Section  5.  Exhaustive  investigations  into  query  plan  generation  are 
discussed  in  Section  6.  Network  management  is  one  of  the  key  challenges  to  fault-tolerant 
computing  in  a  network  of  UAVs.  We  discuss  methods  to  maintain  network  configurations  in 
Section  7.  Section  7  also  deals  replication  management  to  ensure  availability  of  data  when 
nodes  (UAVs)  move  in  and  out  of  the  network.  Finally  in  Section  8,  we  discuss  simulation 
methodology  and  present  results. 
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3. 


ASSUMPTIONS  AND  PROCEDURES 


In  order  to  ground  the  analysis  for  scalable  fault-tolerant  repositories  we  needed  to  define  a  use 
case  which  can  help  understand  the  requirements,  issues  and  constraints  over  which  we  will 
need  to  operate. 

Our  first  step  was  to  cite  some  high  level  requirements  for  any  such  use  case  analysis.  The  use 
case  should  depict  operations  having: 

•  10  -12  Airborne  platforms  (UAVs,  Helos,  Fighters,  AWACs,  ...) 

Which  travelling  at  various  speeds  (~100  kts,  ~200  kts,  ~500  kts) 
some  in  formation,  some  on  independent  tracks 

•  Ground  platforms  (stationary  and  moving) 

Stationary  -  semi  permanent  positions 
Mobile  (vehicle,  foot) 

•  Intermittent  connectivity 

These  operations  involve: 

•  Aggregation  of  distributed  information 
Static  &  dynamic  Query  Plan  generation 
Join  queries 

•  Replication  and  information  movement  services 

Given  these  requirements  our  approach 

•  Identified  actors 

•  Specified  information  needs  in  scenario 

•  Used  representative  locations  for  information  sources 

•  Shows  information  flow 

Example  scenarios  that  we  considered  were: 

•  Cooperative  Combat  Air  Patrol.  Mixture  of  UAVs,  manned  fighters,  and  AWACs 
cooperatively  defend  Taiwan  Strait. 

Information  needed:  Signals,  Lines  of  Bearing,  contact  positions,  tracks 

•  Armed  Reconnaissance  /  Combat  Search  &  Rescue.  Group  of  UAVs  search  for  specific 
targets  in  a  large  area. 

Information  needed:  Detection  positions,  object  characteristics 

•  Unplanned  Tactical  Intelligence  Support.  Ground  unit  reguests  images  from  any  platform 
transiting  area. 

Information  needed:  Area  of  Interest,  object  positions,  object  characteristics,  collection 
period,  images 

•  Persistent  Surveillance.  Multiple  UAVs  conduct  constant  surveillance  of  Waziristan  area 
over  many  months. 

Information  needed:  Object  characteristics,  object  positions,  patterns,  collection  periods 
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•  Cooperative  Close  Air  Support.  Multiple  platforms  execute  a  coordinated  attack  on 
forces  poised  to  overrun  friendly  forces  on  the  Korean  peninsula. 

Information  needed:  Blue  positions,  red  positions,  object  characteristics,  risk  levels, 
confidence  levels 

•  Cooperative  Air  Interdiction.  Multiple  platforms  execute  a  coordinated  attack  on  artillery 
targets  north  of  the  Korean  DMZ. 

Information  needed:  Positions,  Lines  of  Bearing,  SAM  detections 

•  Cooperative  Suppression  of  Enemy  Air  Defenses  (SEAD).  Multiple  platforms  execute  a 
coordinated  attack  on  the  Iranian  IADS. 

Information  needed:  Signals,  Lines  of  Bearing,  positions 

At  the  August  2010  quarterly  review  meeting,  AFRL  representatives  said  the  Cooperative  Close 
Air  Support  scenario  was  of  highest  interest.  We  have  begun  work  on  the  Cooperative  Close  Air 
Support  use  case  and  have  identified  the  following  queries  (specified  in  operational  language): 

1.  Get  all  images  taken  within  last  two  days  of  the  area  bounded  by  latitude, 
longitudel  and  Iatitude2,  longitude2. 

2.  Get  all  images  taken  six  to  eight  months  ago  of  the  area  bounded  by  latitudel, 
longitudel  and  Iatitude2,  longitude2. 

3.  Get  all  SAM  locations  within  12  NM  of  the  area  bounded  by  latitudel,  longitudel 
and  Iatitude2,  longitude2. 

4.  Get  all  AAA  locations  within  2  NM  of  the  area  bounded  by  latitudel,  longitudel  and 
Iatitude2,  longitude2. 

5.  Get  the  latitude,  longitude,  and  elevation  for  any  object  of  type  rocket  launcher  in 
the  area  bounded  by  latitudel,  longitudel  and  Iatitude2,  longitude2. 

6.  Get  the  current  ceiling  and  visibility  in  the  area  bounded  by  latitudel,  longitudel 
and  Iatitude2,  longitude2. 

7.  Get  all  forward  blue  force  positions  within  2  NM  of  the  area  bounded  by  latitudel, 
longitudel  and  Iatitude2,  longitude2. 

The  selected  scenario  in  fleshed  out  in  Section  4. 
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4. 


USE  CASE  SCENARIOS 


Background 

The  contracted  work  involved  research  on  ways  of  storing  and  retrieving  information  from 
multiple  airborne  and  ground  based  networked  resources.  This  use  case  provides  an  operational 
context  for  the  technical  capabilities  formulated  during  this  effort. 

As  seen  in  Figure  2,  the  chosen  scenario  involves  an  outbreak  of  hostilities  on  the  Korean 
peninsula  which  poses  different,  and  in  some  ways  more  difficult,  challenges  for  the  envisioned 
technology  than  those  that  would  be  experienced  in  current  operations  in  Afghanistan.  In  a 
Korean  conflict,  we  will  not  have  air  supremacy  and  the  air  defenses  will  be  significant,  making 
long  duration  overflights  or  orbits  over  enemy  territory  impossible. 

The  use  case  also  incorporates  some  capabilities  that  don't  currently  exist  but  are  anticipated  to 
exist  within  the  next  five  years.  These  include  cooperative  autonomous  unmanned  aerial 
vehicles  (AUAV),  improved  communications  and  networking  equipment,  and  technologies  for 
fusion,  distribution,  and  retrieval  of  information 


Figure  2:  Scenario  Location 
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Scenario 


It's  wintertime  in  Korea,  the  ground  is  covered  in  snow,  and  there  are  frequent  low  ceilings  and 
visibility.  An  army  maneuver  unit  advancing  towards  Pyongyang  has  been  stopped  in  position 
and  is  taking  heavy  fire  and  casualties  from  dug  in,  concealed  enemy  forces. 

Two  Apache  helicopters  were  lost  to  heavy  enemy  fire  when  they  tried  to  take  out  the  enemy 
position.  The  army  maneuver  commander  requests  immediate  close  air  support  to  eliminate  the 
enemy  threat.  The  enemy  position  of  concern  is  in  trees  with  friendly  troops  within  100  yards. 
Even  at  this  close  range,  friendly  ground  forces  cannot  see  the  firing  position  because  of  the 
foliage  and  weather  conditions.  The  area  is  within  range  of  numerous  enemy  anti-aircraft 
artillery  (AAA)  and  surface  to  air  missile  (SAM)  sites. 

Two  A-lOCs  are  tasked  with  providing  the  close  air  support.  Two  autonomous  mini-unmanned 
aerial  vehicles  flying  in  an  orbit  in  the  low  UAV  corridor  five  miles  from  the  Forward  Line  Own 
Troops  (FLOT)  will  also  be  involved.  The  mission  will  be  directed  by  a  Joint  Terminal  Attack 
Controller  (JTAC)  equipped  with  an  advanced  terminal. 

In  preparation  for  the  attack,  the  two  AUAVs  are  tasked  to  conduct  surveillance  over  the  target 
area.  They  construct  a  plan  to  overfly  the  area  from  opposite  directions  to  capture  imagery  from 
different  angles,  and  at  different  times  to  increase  their  survivability. 

The  fused  imagery  is  relayed  to  the  JTAC.  The  JTAC  also  requests  any  imagery  of  the  area 
captured  prior  to  wintertime.  The  combined  imagery  reveals  the  enemy  position  with  enough 
certainty  to  derive  coordinates. 

As  the  A-lOs  are  proceeding  to  the  contact  point,  the  JTAC  forwards  the  best  image  to  the  A- 
10s.  The  A-lOs  decide  on  a  staggered  attack  from  two  different  initial  points  (IPs).  They  receive 
the  nine  line  information  from  the  JTAC.  The  first  aircraft  proceeds  inbound  and  is  cleared  by  the 
JTAC  after  the  aircraft  is  visually  acquired.  The  lead  takes  heavy  fire  but  is  able  to  take  a  good 
shot.  Fie  verbally  relays  target  area  visual  information  to  the  second  aircraft  that  is  now  inbound 
and  has  also  been  cleared  upon  visual  by  the  JTAC.  The  second  aircraft  uses  the  fire  and  smoke 
from  the  first  attack  to  deliver  a  devastating  blow  on  the  position. 

Figure  3  illustrates  the  mission  execution  elements. 
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FLOT 


Figure  3:  Mission  Execution  Elements  and  Plan 


Sequence  of  Events 

Table  1  shows  the  high  level  mission  events  that  take  place  during  the  mission. 


Table  1  Joint  Mission  Thread 
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A4S4UWTM 

A  more  detailed  description  related  to  the  scenario  is  the  following: 

1.  Army  maneuver  commander  requests  immediate  close  air  support  on  a  heavy  weapons 
target  located  at  approximate  latitude  and  longitude  coordinates. 

2.  The  Joint  Air  Operations  Center  (JAOC)  tasks  two  A-lOCs  with  the  mission. 
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3.  The  requesting  commander  is  informed  about  the  mission  and  planned  ordinance. 

4.  Mission  details  are  passed  to  the  JTAC  and  the  assigned  aircraft. 

5.  AUAVs  are  used  under  low  ceilings  in  a  high  threat  environment  to  get  close  up  images  of 
the  target  area. 

6.  The  JTAC  uses  his  advanced  terminal  to  gather  and  collate  information  to  precisely  fix  the 
target  coordinates.  The  advanced  terminal  displays  a  risk  envelope  around  the  target  based 
on  planned  weapons  and  attack  parameters. 

7.  The  attacking  aircraft  are  given  updated  mission  information  via  AWACS  while  enroute  to 
the  contact  point. 

8.  The  attacking  aircraft  and  the  JTAC  establish  communications  and  exchange  information  via 
voice  and  ground  terminal-to-aircraft  system  links. 

9.  The  attacking  aircraft  execute  a  coordinated  attack  on  the  target  and  are  visually  cleared  to 
expend  ordinance  by  the  JTAC. 

10.  AUAVs  are  tasked  to  conduct  post  mission  battle  damage  assessment. 


Distributed  Queries 

The  following  queries  are  executed  on  networked,  distributed  battlespace  information  sources 
on  behalf  of  the  primary  mission  elements  tasked  with  the  mission. 

1.  Get  all  images  taken  within  last  two  days  of  the  area  bounded  by  latitudel,  longitudel  and 
Iatitude2,  longitude2.  This  information  is  needed  by  the  JTAC  to  precisely  fix  the  target. 

2.  Get  all  images  taken  six  to  eight  months  ago  of  the  area  bounded  by  latitudel,  longitudel  and 
Iatitude2,  longitude2.  This  information  about  the  target  area  without  snow  cover  is  needed  by 
the  JTAC  to  precisely  fix  the  target. 

3.  Get  all  SAM  locations  and  types  within  12  NM  of  the  area  bounded  by  latitudel,  longitudel 
and  Iatitude2,  longitude2.  This  information  is  needed  by  the  attacking  aircraft. 

4.  Get  all  AAA  locations  and  types  within  2  NM  of  the  area  bounded  by  latitudel,  longitudel  and 
Iatitude2,  longitude2.  This  information  is  needed  by  the  attacking  aircraft. 

5.  Get  the  latitude,  longitude,  and  elevation  for  any  object  of  type  X  in  the  area  bounded  by 
latitudel,  longitudel  and  Iatitude2,  longitude2.  This  information  is  needed  by  the  JTAC  and  the 
attacking  aircraft  to  precisely  fix  the  target. 

6.  Get  the  current  ceiling  and  visibility  in  the  area  bounded  by  latitudel,  longitudel  and 
Iatitude2,  longitude2.  This  information  is  needed  by  the  JTAC  and  the  attacking  aircraft  for 
attack  planning  (weapons  and  tactics). 

7.  Get  all  forward  blue  force  positions  within  2  NM  of  the  area  bounded  by  latitudel,  longitudel 
and  Iatitude2,  longitude2.  This  information  is  needed  by  the  JTAC  and  the  attacking  aircraft  to 
avoid  fratricide. 
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8.  Get  risk  envelope  for  planned  munitions  under  planned  delivery  conditions.  This  information 
is  needed  by  the  JTAC  and  the  attacking  aircraft  to  avoid  fratricide. 

Discussion 

The  central  actor  for  most  of  the  distributed  queries  in  this  scenario  is  the  JTAC  as  this  person 
has  'on-scene'  awareness  of  the  ground  situation  and  is  best  able  the  assess  what  additional 
information  is  needed  to  execute  a  successful  attack.  Secondary  actors  are  the  pilots  of  the 
attacking  aircraft  who  need  the  best  available  information  on  the  locations  of  air  defense 
systems  and  the  current  weather  in  the  target  area,  particularly  ceiling  and  visibility. 

The  central  problem  that  is  driving  the  target-related  queries  is  the  need  to  determine  the 
precise  coordinates  of  a  target  that  is  concealed  and  in  close  proximity  to  friendly  forces.  To 
solve  this  problem,  light-weight,  expendable  AUAVs  are  tasked  to  overfly  the  target  area  to 
collect  imagery,  and  queries  are  made  to  distributed  resources  for  other  recent  imagery  of  the 
same  area,  and  imagery  from  a  time  period  before  there  was  snow  cover.  The  distributed 
sources  for  this  information  could  be  airborne  platforms,  theater  ground  systems  such  as  the 
Distributed  Common  Ground  System  (DCGS),  and  national  systems. 

The  aggregation  and  fusing  of  retrieved  imagery  to  derive  target  coordinates  would  normally  be 
done  in  theater  Intelligence,  Surveillance,  and  Reconnaissance  (ISR)  and  Command  and  Control 
(C2)  systems.  However,  in  thinking  about  future  technical  possibilities,  and  the  need  for 
increased  responsiveness  in  an  environment  such  as  a  future  Korean  conflict,  these  operations 
could  be  accomplished  by  the  JTAC  using  an  intelligent  terminal  which  would  also  be  used  to 
relay  the  processed  information  (annotated  maps,  images  and  coordinates)  to  the  attacking 
aircraft.  Additionally,  the  attacking  aircraft  have  automated  systems  that  display  threat  and 
friendly  position  information  received  over  data  link. 
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5.  METADATA  MANAGEMENT 

This  section  describes  the  data  as  well  as  the  Meta  data  maintained  at  each  node  for  the 
purpose  of  query  processing. 

The  meta-data  is  maintained  using  a  relational  database.  The  schema  for  each  piece  of  meta¬ 
data  is  described  below. 

Data  at  each  node  is  assumed  to  be  a  relation  with  the  following  format  (approximately  100 
bytes  per  tuple/row).  Refer  to  Table  2  for  details. 

Rjj  correspond  to  relation  R,  at  node  j.  R,i  (i  =  j)  will  be  used  to  represent  the  primary  copy  of  a 
relation  at  node  i.  Ry  (i  <>  j)  will  be  used  to  indicate  the  replica  of  Ri  in  node  j. 

A  field  'TimeOfUpdate'  is  maintained  for  each  update  that  happens  over  the  Meta  Data  to 
estimate  the  accuracy  of  data  and  keep  a  track  of  how  recent  the  update  has  been  done. 

5.1  Data  stored  at  each  node 

5.1.1  Original  Data 

Relation  /?,,  is  shown  below: 


Table  2  Relation  Format 


Timestamp 

Nodeid 

Lat 

Long 

Obj type 

Obj desc 

Object ptr 

8  bytes 

4  bytes 

4  bytes 

4  bytes 

8  chars 

Varchar (64) 

Pointer  (8 
bytes) 

A  number  of  additional  information  about  the  characteristics  of  each  /?„  is  maintained  in  a  node  / 
(and  periodically  propagated  to  all  other  nodes)  for  the  purpose  of  query  plan  generation.  Figure 
2.2,  shows  an  example  of  the  original  data  at  each  node. 

5.1.2  Replicated  Data 

If  a  relation  R,  is  replicated  at  this  node(j),  then  for  each  replicated  relation  Ry,  we  need  to 
maintain  the  same  information  as  in  the  previous  table.  The  difference  is  that  this  information 
may  not  be  current.  Every  Node  maintains  a  copy  of  its  original  relation  that  is  stored  at  some 
other  node.  Ry  will  be  used  to  indicate  the  replica  of  R,  in  node  j. 

Currently,  replication  is  assumed  to  be  a  single  copy  and  complete  for  each  relation. 

5.2  Network  Managed  Data 

Network  Managed  data  is  maintained  and  updated  by  the  middleware  and  thereby  accessed  for 
processing  by  intermediate  steps  of  the  query  plan  generator. 
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5.2.1  Relation  Attribute  Information  at  each  node 

A  number  of  additional  information  about  the  characteristics  of  each  Rn  is  maintained  in  node  i 
and  is  periodically  propagated  to  all  other  nodes  for  the  purpose  of  query  plan  generation  and 
cost  estimation.  Selectivity  for  simple  and  composite  conditions  can  be  inferred  from  Table  3. 

Table  3  Original  Data  at  each  node 


Attr  Name 

Type 

Cardinality 

Position 

Width 

Min 

Value 

Max 

Value 

Unique  values  in 
the  range 

Timestamp 

number 

1200 

1 

100 

50 

140 

90 

Lat 

number 

1200 

2 

4 

10 

100 

90 

ObjType 

varchar 

4000 

3 

64 

20 

350 

330 

ObjPtr 

categorical 

2000 

3 

8 

Null 

Null 

10 

5.2.2  Relation  Information 

In  addition,  we  may  maintain  some  information  about  cardinality  and  tuple  width  of  each 
relation  in  that  node  as  in  table  4. 


Table  4  Relation  Information 


TimeOfUpdate 

RelationName 

Cardinality 

Width 

Ri 

8000 

70 

5.2.3  Relation  to  Node  Mapping  Matrix 

A  Relation  to  Node  mapping  table  as  in  the  Table  5,  is  maintained  by  the  message  management 
system  at  each  node  which  tells  about  the  location  of  the  original  and  the  replica  of  a  Relation.  A 
value  of  0  in  the  replica  node  column  indicates  that  the  replica  is  not  complete  at  this  point  in 
time  and  hence  is  not  considered  for  generating  a  query  plan. 


Table  5  Information  about  the  location  of  replica  of  each  node 


Name 

Original  Node 

Replica  Node 

Ri 

1 

4 

r2 

2 

1 

Rb 

3 

4 

r4 

4 

0 

Rn 

N 

k 
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5.2.4  Connectivity  Map 

A  connectivity  Map  is  maintained  at  each  node  which  checks  for  whether  there  is  connection 
between  any  two  nodes  and  the  current  bandwidth  between  the  two.  If  the  Received  Signal 
Strength  (RSS)  is  zero  or  below  the  considerable  threshold,  connection  is  considered  to  be  0  and, 
1  otherwise.  RSS  value  lies  on  a  scale  of  1  to  10.  LSF  (the  Link  stability  Factor)  is  a  function  of 
rate  of  change  of  RSS  value  over  a  period  of  time.  A  pair  is  considered  for  the  plan  generation  if 
the  RSS  value  at  the  instant  is  1.  A  sample  connectivity  map  is  shown  in  Table  6. 


Table  6  Connectivity  Map 


Node; 

Nodej 

RSS 

LSF 

Bandwidth 

Startup  Cost 

Ri 

Rb 

1 

5 

100 

10 
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6.  QUERY  PLAN  GENERATION,  HEURISTICS,  AND  EXECUTION 

A  query  plan  for  this  scenario  is  envisioned  as  numbered  sequence  of  plan  steps  that  can  be 
easily  interpreted  and  executed  at  any  node.  Table  7  gives  a  description  of  a  plan  format.  Each 
step  includes  the  operation  to  be  applied,  the  data  items  involved,  the  node  where  it  is  applied, 
the  name  of  the  result  and  the  node  where  it  is  created. 

6.1  Query  Plan  Format 

Unlike  traditional  query  processing,  the  plan  needs  to  be  sent  from  node  to  node  (or  partial 
plans  generated  at  each  node  which  is  not  considered  in  this  project)  for  the  purposes  of  query 
processing.  A  counter  indicates  the  next  step  to  be  executed1.  An  example  of  a  query  plan  is 
shown  in  Table  8. 

Plan  counter  (initially  set  to  1  and  incremented  after  the  execution  of  each  operation/row) 


Table  7:  Plan  Format 


Operation  1 

Param 

Operandl 

Operandl 

Loc 

Operand2 

Operand  2 
Location 

Result 

Name 

Result 

Loc 

Operation  2 

Param 

Operandl 

Operandl 

Loc 

Operandl 

Operand2 

Loc 

Result 

Name 

Result 

Loc 

Operation  n 

Param 

Operandl 

Operandl 

Loc 

Operandl 

Operand2 

Loc 

Result 

Name 

Result 

Loc 

Table  8:  Plan  Format  Example 


Operation 

Param 

Operand 

1 

Operand 

1  Loc 

Operand 

2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

A>  100 

Ri 

1 

Null 

Null 

Ri' 

1 

Project 

Ai,  A3,  A4 

Ri' 

1 

Null 

Null 

Ri" 

1 

Move 

Null 

Ri" 

1 

Null 

Null 

R" 

2 

SemiJoin 

A  =  C 

R" 

2 

r2 

2 

SR: 

2 

Join 

B  =  D 

Rl2 

2 

r2" 

2 

JRa 

2 

The  plan  format  described  above  is  sufficient  to  describe  any  arbitrary  relational  query  plan 
involving  selects,  projects,  and  joins  (also  known  as  an  SPJ  query).  The  above  format  can  also 
accommodate  SQL  aggregate  operators,  such  as  a  SUM,  COUNT,  AVERAGE,  MINIMUM,  and 
MAXIMUM  needed  for  the  next  phase. 

1  It  is  also  possible  to  send  the  entire  plan  (or  even  portions  relevant  to  each  node)  to  all  nodes  that 
execute  portions  of  the  plan  in  which  case  it  need  not  be  sent  from  node  to  node! 
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A  query  is  executed  as  follows.  A  complete  plan  is  generated  at  the  node  where  the  query  is 
received  using  the  metadata  stored  in  that  node.  The  plan  is  then  sent  to  the  node  in  which  the 
first  operation  takes  place  (if  it  different  from  the  node  where  the  query  plan  is  generated) 
along  with  a  value  indicating  the  starting  plan  step.  The  interpreter  in  that  node  uses  the  plan 
counter  to  execute  as  many  steps  as  possible  in  that  node.  When  a  move  or  copy  is 
encountered,  it  sends  the  data  as  well  as  the  plan  (most  likely  the  remaining  portion  of  the  plan 
to  reduce  the  amount  of  data  transferred)  to  the  next  node.  This  process  continues  until  the  last 
step  of  the  plan  is  executed.  The  result  of  the  query  will  be  in  the  node  that  executed  the  last 
step. 

Currently,  a  complete  query  plan  is  generated  as  follows.  Each  node  in  the  architecture  has  the 
same  query  plan  generator  and  uses  the  only  the  Metadata  in  that  node.  Note  that  the 
metadata  is  updated  by  the  underlying  mechanism  described  in  Section  2.  The  query  plan  is 
constructed  one  join/semi-join  at  a  time.  Cost  computations  of  partial  plans  are  done  using  the 
statistics  and  formulae  for  computing  selectivity  (described  below).  The  lowest  total  cost  query 
plan  is  used  as  the  final  plan  after  all  possible  plans  are  explored.  This  will  result  in  an  optimal 
plan.  Several  heuristics  are  explored  as  part  of  this  project  to  reduce  the  total  computation 
required  for  generating  a  plan.  These  are  also  compared. 

The  complexity  of  the  optimal  plan  generation  is  kn  where  n  is  the  number  of  joins  and  k  is  the 
number  of  alternatives  for  each  join.  Currently,  k  being  used  is  about  18  (multiple  join 
alternatives,  multiple  semi-join  alternatives  and  the  same  using  replica  as  well)  and  we  are 
assuming  no  more  than  3  joins  in  a  query.  With  this  assumption,  we  will  explore  5000+ 
alternative  query  plans  and  cost  computation  for  each  one  of  them.  We  have  incorporated  some 
heuristics  to  limit  the  number  of  plans  carried  forward  after  each  join.  Simulations  will  be 
performed  to  validate  the  heuristics  to  make  sure  they  are  meaningful. 

Statistics  in  the  form  of  cardinality  and  domain  characteristics  are  used  for  cost  estimation.  Join 
and  condition  selectivity  are  inferred  from  the  statistics  maintained.  Result  sizes  are  also 
estimated  and  its  accuracy  is  important  as  choice  of  the  best  query  plan  is  primarily  based  on 
the  cost  of  data  transfer  based  on  availability  of  connectivity. 

Below,  we  outline  the  statistics  used  for  evaluating  the  cost  of  a  (partial)  query  plan.  Most  of 
these  are  well-established  for  the  relational  model.  We  do  not  include  the  processing  cost  for 
the  operation/plan,  but  only  the  data  transfer  cost.  Processing  cost  depends  upon  the 
availability  of  index  and  other  structures  and  mainly  influences  the  order  of  join  (which  we  take 
into  account  in  our  plan  generation  process).  Beyond  this  project,  it  will  be  useful  to  determine 
what  access  structures  are  meaningful  and  take  the  processing  cost  into  account  as  well.  In  each 
step,  the  plan  can  be  executed  by  converting  it  into  an  SQL  statement  if  there  a  relational 
database  is  used  for  storing  data  in  that  node. 
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6.2  Statistics  used  for  plan  generation 
Card(R):  Number  of  tuples  in  relation  R 

Card(R.A):  Number  of  distinct  values  of  attribute  A  of  R 

Co^  Fixed  cost  to  start  up  a  connection 

LSFjj:  Current  Link  Stability  Factor  between  nodes  /  and  j  (in  bytes/time  unit);  0  if  not 

connected 

Select:  p  is  a  fraction  of  tuples  satisfying  a  given  select  condition 

Result  size:  card(S)  =  p(Ci)  *  card(R)  //  when  select(Ci)  is  applied  to  R 
p  (A  =  value)  =  l/card(R.A) 
p  (A  >  value)  =  max(A)-  value/  max(A)  -  min(A) 
p  (A  <  value)  =  value  -  min(A)  /  max(A)  -  min(A) 
p  (A  *  value)  =  1  -  l/card(R.A) 

p  (A  =  v:  AND  B=  v2)  =  l/card(R.A)  *  l/card(R.B)  //assuming  independence 
p  (A  =  v:  OR  B=  v2)  =  l/card(R.A)  +  l/card(R.B) 

For  a  select  operation,  width  of  the  result  is  the  same  as  that  of  the  operand 
Project: For  S  =  project(A1(  A2, Am)[R]  with  attributes  Ah  A ^  ...,  An  and  m  <  n 
card(S)  =  Card(R),  if  projected  attributes  include  a  key  of  R, 

=  max{card(R.A|), , ..ca rd ( R. Aj)}  otherwise 

Join:  T=  R[A=B]S 

card  (T)  is  always  <  card  (R)  *  card(S) 

Card(R[A=B]S)=  max{card(R),  card(S)} ,  if  A  is  a  key  of  R  and  S  a  foreign  key 

Else,  Card(R[A=B]S)  =  card  (S.B)/  card(dom(B)) 

Theoretically,  Join  Selectivity,  JS=  card(result)/  card(cross  product) 

width(T)  =width(R)  +  width(S)  //width  always  refers  to  the  width  of  a  tuple 

In  case  of  Natural  Join,  since  natural  join  also  eliminates  one  of  the  join 
attributes,  the  width  of  join  attributes  is  subtracted  from  the  width  of  result. 
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Cost=  By  *  card(S)  *  width(S)  if  Relation  5  is  sent  to  the  node  of  R. 
Cost  =  By  *  card(R)  *  width  (R)  if  Relation  R  is  sent  to  the  node  of  S. 


Semi-Join:  T=  R<A=B]S  that  is,  S  is  projected  on  B  and  sent  to  the  node  of  R  for  the 

semi-join 

p(selectivity  of  semi-join)  is  needed  to  estimate  the  result  size 

p  =  card  (S.B)/card(dom(B))  //  card(dom(B))  denotes  the  number  of  distinct 
values  in  B's  domain 

Card  (T)  =  p  *  card(R) 

width(T)=  width(R)  +  width(S.B) 

Cost  (Move(S.B  in  node  /  to  node /))  =  C0  +  By  *  card(S.B)  *  width(S.B). 

The  cost  for  T  =  R  [A=B>S  can  be  computed  similarly. 

6.3  Query  Plan  generation  Design 

In  this  section,  we  present  the  algorithm  that  has  been  developed  and  implemented  for  this 
project.  We  start  with  the  UML  diagrams,  input  from  which  a  QueryObject  is  generated  which  is 
used  by  the  queryPlanGenerator  to  generate  all  plans  for  that  query  using:  join  methods,  semi¬ 
join  methods,  hybrid  of  join  and  semi  join  methods.  Single-copy  replication  is  also  taken  into 
consideration  for  generating  the  above  plans. 

The  following  UML  shows  the  classes  used  for  the  implementation  of  the  query  plan  generator. 


QueryObject 

relations  :  ArrayList<String> 

projectionAttributes:  HashMap<String,ArrayList<String» 
selectConjuncts  :  HashMap<String,ArrayList<Conjunct» 
joinConjuncts  :  HashMap<HashSet<String>,ArrayList<Conjunct» 
targetNode:  String 
dbinteractions:  dblnteractions 
+QueryObject(String  query) 

+getProjectionAttributes(String  relation):  ArrayList<String> 
+getRelations():  ArrayList<String> 

+getTa  rgetNode():String 

+getSelectConditions(String  relation):  ArrayList<Condition> 
+getJoinConditions(String  relationl):  ArrayList<Condition> 
+getJoinConditions(String  relationl, String  relation2):  ArrayList<Condition> 
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+toString()  :  String 

-isSelect(Condition  testCondition):  boolean 
-isJoin(Condition  testCondition):  Boolean 

a  1 


|  1...* 

Conjunct 

conditions:  ArrayList<Condition> 
Conjunct(ArrayList<Condition>) 
getConditions():ArrayList<Condition> 
toString()  :  String 


Condition 
leftOperand:  String 
rightOperand:  String 
operator:  Operator 
not:  Boolean 

Condition(*Full  Argument*) 

getters... 

toString()  :  String 


DBInteractions 

conn:  Connection 
establishDBConnection() 

checkAttributeNamefattributeName,  relationName  :  String)  :  Boolean 
getRelationl\lame(attributeName  :  String)  :  ArrayList 
relLocations(relationName  :  String)  :  ArrayList 
relCardinality(relationl\lame  :  String)  :  long 
relWidth(relationName  :  String)  :  long 

attributeDetails(attributel\lame,  relationName  :  String)  :  ArrayList 
connectivityDetailsfnodel,  node2  :  Int) :  ArrayList 
getJoinSelectivity(joinCondition  :  Condition)  :  Double 

getSelectSelectivity(conjunctCondList:  ArrayList<Conjunct>,  relationName  :  String)  :  Double 
getCondSelectivity(selectCondition  :  Condition,  relationName  :  String)  :  Double 


PlanSpace 

planList :  ArrayList<aPlan> 

PlanSpace() 

createPlansjqueryObject:  QueryObject,  dbObjetc:  dblnteractions) 
getNumPlansj):  Int 
getPlanfindex :  int):  aPlan 
sort() 

toString() :  String 

_ 


|  18Ak 
aPlan 

stepList :  ArrayList<planStep> 
semiJoins:  Stack<String> 
currentLocations:  HashMap<String,  String> 
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joinNames:  HashMap<String,  String> 

semiJoinConditions:  HashMap<String,  String> 

totalCost :  double 

planNo  :  int 

planType:  int 

numJoins  :  int 

totalJoins:  int 

getTotalCost() :  double 

getStep(index  :  int) :  planStep 

addStep(**full  Step  arguments**) 

addStep(step  :  planStep) 

checkFor  Join(join  :  String) :  boolean 

addJoin(step:  PlanStep) 

addSemiJoin(step:  PlanStep) 

getJoinResultLoc(Rname  :  String) :  String 

getNumJoins() :  int 

getPlanTypeQ:  int 

getTotalJoins() :  int 

toString() :  String 

a  1 


PlanStep 

stepData  :  String)] 

0:  Operation  :  String 
1:  Param  :  String 
2:  operandl :  String 
3:  operand2  :  String 
4:  resultName  :  String 
5:  oplLocation  :  String 
6:  op2Location  :  String 
7:  resultLocation  :  String 
cost :  Double 
Full  argument  constructor 
getters... 
setters... 


The  query  to  be  processed  is  parsed  (using  the  JQLParser,  an  open-source  software)  and  its 
components  are  packaged  into  a  QueryObject  object.  The  QueryObject  is  constructed  from  a 
String  containing  the  SQL  query  -  the  format  for  the  String  is  a  standard  SQL  SPJ  (SELECT- 
PROJECT-JOIN)  query  preceded  by  TARGET  node  number  to  which  the  query  is  sent  and  where 
the  result  is  expected.  Currently,  the  result  of  a  query  is  sent  to  only  the  target  node.  However, 
it  is  possible  to  extend  the  algorithm  to  send  the  results  to  multiple  nodes  (and  even  to  optimize 
for  the  set  of  nodes)  if  need  be.  To  simplify  parsing,  select  and  join  conditions  are  assumed  to  be 
given  in  conjunctive  normal  form  (or  CNF).  The  QueryObject  constructor  parses  the  string  and 
populates  internal  data  structures  with  the  target  node,  projected  attributes,  and  select/join 
conditions.  Once  the  query  object  has  been  created,  it  is  passed  as  an  argument  to  the 
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PlanGenerator  object.  The  PlanGenerator  object  can  call  on  public  methods  of  the  QueryObject 
to  get  ArrayLists  of  all  projection  attributes,  join  conditions,  and  select  conditions.  Additionally, 
methods  exist  for  getting  all  projection  attributes  and  conditions  per  relation.  For  example,  a  call 
of  getProjectionAttributesf'rl")  returns  an  ArrayList  containing  all  projection  attributes  from  the 
relation  "rl". 

QueryObject  contains  elements  of  a  query  as  described  below: 

The  FROM  relations  are  stored  as  an  ArrayList  of  Strings;  no  special  data  structures  are  required, 
and  the  user  will  simply  ask  for  all  relations. 

Projection  attributes  are  stored  as  a  HashMap  (for  efficiency  of  lookup).  Given  a  key  (of  type 
String)  of  the  relation,  an  ArrayList  of  all  projection  attributes  (type  string)  on  that  relation  is 
returned. 

Join  and  Select  conditions  each  have  a  separate  HashMap.  They  are  stored  similar  to  projection 
attributes,  but  with  a  few  differences;  an  ArrayList  of  Conjuncts  is  returned  in  each  case.  In  the 
case  of  Join,  since  two  relations  are  involved  in  a  join,  the  key  is  a  pair  of  relations  (HashSet  of 
String),  rather  than  a  single  relation. 

The  Conjunct  and  Condition  classes  are  used  to  represent  select  and  join  conditions.  The 
Conjunct  class  contains  an  ArrayList  of  Conditions  -  these  Conditions  are  primitive  conditions  - 
all  they  contain  are  a  left  operand,  a  right  operand,  a  comparison  operator  and  a  Boolean  value 
representing  the  existence  of  a  "NOT"  operator.  When  an  ArrayList  of  Conjuncts  is  returned  by 
any  method  of  the  QueryObject,  it  is  assumed  that  all  Conjuncts  contained  by  the  ArrayList 
should  be  "AND"ed  together,  and  that  all  the  conditions  of  an  individual  Conjunct  object  should 
be  "OR"d  together. 

Currently,  we  are  parsing  an  input  string  containing  SELECT,  FROM,  and  WHERE  clauses.  Some 
reasonable  assumptions  are  made  on  the  select  and  join  conditions  (e.g.,  being  n  CNF).  The 
WHERE  condition  is  parsed  by  our  code  and  the  rest  by  the  JQLParser.  The  next  step  is  to  use 
JQLParser  to  parse  everything  to  populate  the  QueryObject. 

Plan  Generation  Objects 

PlanStep:  Each  PlanStep  contains  seven  strings,  one  operator,  parameters  (as  string),  2 

operands,  2  operand  locations,  a  result  name  and  a  result  location.  In  addition 
each  step  also  stores  a  cost.  The  cost  is  zero  except  for  steps  involving  move  or 
copy  operation. 

The  functions  for  PlanStep  consist  of  getters,  setters  and  a  toString  to  print,  so  it 
is  purely  a  data  storage  class. 

aPlan:  Each  aPlan  contains  an  ArrayList  parameterized  to  PlanStep,  which  is  the  body 

of  a  plan,  a  total  cost,  a  plan  number,  a  stack,  and  three  HashMaps.  The  stack 
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keeps  track  of  which  relations  are  needed  for  the  second  half  of  a  semijoin 
operation.  The  first  HashMap  maps  relation  names  as  keys  to  the  name  of  the 
relation  they  have  been  joined  to  or  themselves  if  they  have  not  been  joined. 
The  second  HashMap  maps  relation  names  to  their  locations.  The  third 
HashMap  maps  leftover  semijoin  relations  to  their  conditions  after  the  first  part 
of  a  semijoin  has  been  performed  since  after  the  first  semiJoin  has  been 
performed  they  may  not  have  the  same  relation  name. 

The  functions  for  aPlan  consist  of  getters,  setters  and  functions  for  adding  steps 
which  ensure  that  the  maps  are  up  to  date.  It  also  contains  a  toString  function 
that  will  print  all  of  the  steps  in  a  plan  and  a  compareTo  function  that  allows  the 
plans  to  be  sorted  by  their  total  cost. 

PlanSpace:  Contains  an  ArrayList  of  aPlan  that  stores  all  the  plans  generated  during  the 

algorithms  runtime. 

The  functions  of  a  Plan  consist  of  getters,  setters,  a  function  to  sort  all  plans  by 
their  cost  and  a  function  to  populate  the  list  of  plans  with  the  proper  amount 
for  a  query.  It  also  contains  a  toString  function  to  print  all  of  the  plans. 

6.4  Plan  Generation  Algorithm 

The  generator  begins  by  generating  18  distinct  partial  plans  (initially  empty)  for  each  join.  As  an 
exhaustive  algorithm,  it  generates  18k  plans  for  a  query  containing  k  joins.  It  is  evident  that 
this  approach  is  not  viable  beyond  a  few  joins.  This  is  being  done  so  that  later  we  can  compare 
heuristics-based  plans  with  the  optimal  ones  to  analyze  the  effectiveness  of  heuristics  we  come 
up  with  (e.g.,  top-k  in  each  iteration,  top-k  cumulatively,  top-k  for  each  type  of  plan.).  The 
generator  then  iterates  through  the  relation  list  and  creates  the  necessary  select  statements. 
Then  all  of  the  attributes  required  are  projected  on  the  output  and  join  condition  attributes  to 
minimize  the  data  transfer  across  nodes  which  form  the  bulk  of  the  cost  of  query  processing  in 
this  environment.  Since  most  of  the  plans  will  use  these  initial  select  or  project  statements  (to 
reduce  the  width  and  cardinality  of  the  relation),  these  same  statements  are  attached  to  every 
plan. 

For  plan  alternatives  using  joins  the  generator  moves  the  required  relations  to  the  location  of 
the  join  and  then  performs  the  join.  Even  for  this,  projections  are  applied  to  reduce  the  overall 
width  and  cardinality  of  relations  moved.  The  plan  class  takes  care  of  updating  intermediary 
name,  location,  and  condition  information.  Then  the  generator  moves  on  to  the  next  plan. 

For  plan  alternatives  using  semijoins,  the  relation  that  will  be  semijoined  to  is  copied  and 
projected  on  the  attributes  used  in  the  specific  join  condition  to  minimize  data  transfer.  Then  it 
is  moved  to  the  location  of  the  semijoin.  The  semioin  is  performed.  When  the  semijoin  step  is 
added  to  a  plan  the  plan  updates  name,  location,  and  condition  information  and  in  the  case  of 
semijoins  the  output  relation  and  the  relation  that  still  needs  to  be  semijoined  to  finish  the 
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operation  is  added  to  a  stack  to  keep  track  of  the  remaining  semijoins  (to  generate  chained  semi 
join  plans)  .  The  next  plan  is  then  processed. 

For  multiple  joins,  after  all  of  the  plans  have  been  processed  with  the  first  join,  all  of  the  joined 
relations  will  be  projected  on  the  remaining  join  attributes  required  and  then  algorithm  will 
iterate  through  all  the  plans  again  performing  the  remaining  joins  and  semiJoins.  After  a 
relation  has  been  joined,  its  current  location  is  considered  to  be  that  of  the  result  of  the  join 
even  if  it  currently  has  a  replica,  which  may  cause  some  of  the  plans  to  be  the  same. 

After  all  cases  have  been  exhausted,  the  algorithm  goes  through  and  finishes  each  case  by 
iterating  through  the  stack  of  remaining  semijoins  completing  the  remaining  semijoins  in  reverse 
order  and  then  moves  the  final  relation  to  its  output  node. 

During  each  step  of  the  plan  generation,  the  cost  associated  with  a  move  or  a  copy  is  calculated, 
if  there  is  no  direct  connection  between  nodes  the  cost  is  considered  prohibitively  high  and 
value  is  automatically  forced  to  a  very  high  level  by  using  a  very  low  bandwidth  for  the 
calculation.  After  calculation  the  plans  can  be  viewed  in  sorted  form.  The  plan  generator 
generates  a  summary  of:  number  of  plans  generated,  lowest  and  highest  cost  plan  numbers.  It  is 
possible  to  view  any  of  the  plans  in  detail.  The  same  process  is  used  for  generating  plans  using 
heuristics  except  that  a  subset  of  plans  are  used  in  each  iteration  which  are  selected  based  on 
the  specific  heuristic. 

6.5  Examples  of  Generated  Plans 

Consider  a  query  Q  that  is  sent  to  node  1  and  the  results  of  the  query  will  be  sent  to  node  1. 

Q:  TARGET  1 

SELECT  * 

FROM  U_1_D,U_2_D,U_5_D" 

WHERE  ((U_5_D.OBJTYPE=l))" 

AND  ((U_1_D.LAT>U_2_D.LAT))" 

AND  ((U_2_D.LONG>U_5_D.LONG))"); 


Below,  we  present  three  plans  with  their  cost  and  some  information  about  how  they  were 
generated  in  terms  of  plan  combinations. 
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The  lowest  cost  plan  is 

Plan  Number  253 

Total  Cost  493.87333333333333  (in  milli  seconds) 
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This  plan  is  a  combination  of  alternatives  15  (for  the  first  join)  and  2  (for  the  second  join) 


This  plan  using  only  semi  joins  has  only  a  slightly  higher  cost 

Plan  Number  263 

Total  Cost  585.942  (in  milli  seconds) 
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This  plan  is  generated  from  alternatives  15  and  11 
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The  highest  cost  plan  is 

Plan  Number  3 

Total  Cost  175117.8  (in  milli  seconds) 
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This  plan  is  generated  from  alternatives  1  and 
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6.6  Heuristics-Based  plan  generation 


The  purpose  of  generating  a  very  large  plan  space  (in  the  above  description)  is  to  demonstrate  the  cost 
differences  between  best  and  worst  plans.  The  above  algorithm  is  still  not  exhaustive  in  that  it  does  not 
consider  all  join  combinations.  As  can  be  seen  clearly,  there  is  a  significant  difference  between  the  best 
and  the  worst  pan.  During  testing,  we  also  realized  that  the  connectivity  plays  a  critical  role  in  that  if 
only  one  way  connection  is  available  between  nodes,  it  impairs  good  plan  generation. 

We  have  added  a  number  of  heuristics  to  the  above  algorithm  to  compare  their  performance  to  the 
optimal  plan  generated  by  the  above  algorithm.  We  can  use  this  implementation  to  analyze  various 
aspects  such  as  connectivity,  bandwidth,  as  well  as  selectivity  to  understand  the  types  of  plans 
generated.  We  have  identified  the  following  heuristics  to  be  useful  and  have  implemented  then  so  that 
we  can  compare  them  to  the  plan  produced  by  the  above  algorithm. 

Choose  top  k  lowest  cost  partial  plans  in  each  round  for  expansion  or  iteration  (Top-k-iteration) 

Choose  top  k  lowest  cumulative  cost  (up  to  that  round)  plans  in  each  round  of  expansion  (Top-k- 
cumulative) 

Categorize  plans  into  join-based,  semi-join  based,  and  hybrid.  Choose  top  k  lowest  cost  plan  from  each 
category  for  expansion  in  each  round  (Top-k-join-type) 

In  addition  to  the  above,  a  number  of  other  possibilities  such  as  incremental  plan  generation,  looking 
ahead  at  connectivity  and  pruning  plan  alternatives,  getting  dynamic  cost  information  and  then 
generating  partial  plans  are  possible.  Note  that  connectivity  will  play  a  significant  role  not  only  in  the 
generation  of  a  complete  plan,  but  also  its  cost.  If  sufficient  connectivity  does  not  exist  among  the  nodes 
that  participate  in  the  query  (including  the  nodes  that  have  a  replica),  a  complete  query  plan  may  not 
even  be  feasible.  The  presence  of  replica  increases  the  probability  of  generating  a  complete  plan  and  if 
several  exist,  heuristics  hopefully  will  choose  a  good  one  without  having  to  generate  all  plans.  A 
heuristic  that  involves  connectivity  would  be  very  useful  for  this  environment. 

In  Top-k-iteration,  k  (specified  as  a  parameter)  lowest  cost  plans  in  that  round  are  chosen  in  each  round 
for  further  expansion.  This  works  as  a  local  heuristic  and  hopes  that  the  lowers  cost  plans  will  continue 
to  show  good  cost  value. 

In  Top-k-cumulative,  k  (again  specified  as  a  parameter)  lowest  cumulative  cost  plans  up  to  that  round 
are  chosen  in  each  round.  This,  hopefully,  will  improve  upon  the  previous  one  in  choosing  plans  that  are 
good  up  to  that  round. 

The  Top-k-join-type  is  a  different  type  of  heuristic  as  we  have  different  types  of  partial  plans  and  their 
costs  are  likely  to  be  different.  Here,  k  lowest  cost  plans  from  each  type  is  chosen  for  the  next  round.  In 
order  to  compare  them  in  a  fair  manner,  the  k  value  need  to  be  lower  (1/3  as  we  have  3  join  types)  so 
that  the  same  number  of  plans  are  carried  forward  in  each  round. 
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The  above  three  heuristics  have  been  implemented.  The  software  has  two  modes:  interactive  and 
experimental  to  make  it  easy  to  test  and  use.  In  the  interactive  mode,  a  query  can  be  given  at  the 
prompt  (or  in  a  file)  and  a  heuristic  specified  for  its  plan  generation.  The  generator  will  indicate  the 
number  of  plans  generated  as  well  as  the  lowest  and  highest  cost  plans  (along  with  plan  number).  One 
can  output  (or  look  at)  any  plan  in  details  by  typing  the  plan  number.  It  is  also  possible  to  provide  a  file 
input  to  process  multiple  queries  in  this  mode.  The  selectivity  and  cardinality  information  is  statically 
initialized.  The  connectivity  is  also  statically  initialized  at  the  start  of  the  system.  This  can  be  easily 
changed  by  loading  a  new  or  different  relation  before  executing  the  plan  generator. 

In  the  experimental  mode,  one  can  either  provide  input  at  the  prompt  or  submit  a  file  for  processing. 

The  input  consists  of  (whether  given  as  prompted  or  in  a  file):  number  of  queries  to  be  generated,  seed 
for  query  generation,  number  of  connectivity  configurations  to  be  used  in  the  experiment,  seed  for 
configuration,  and  connectivity  factor.  The  generator  has  a  random  query  generator  on  the  schema 
stored  in  the  system  and  generates  the  desired  number  of  queries.  The  seed  is  to  ensure  repeatability  of 
experiments  as  well  as  generate  a  new  sequence  of  pseudo-random  queries.  The  same  is  true  for 
network  configurations  and  its  seed.  The  connectivity  factor  is  use  to  control  the  sparseness  of  the 
connectivity  matrix.  If  there  are  n  nodes,  the  connectivity  factor  can  vary  from  0  to  (n-1),  0  indicating  no 
connectivity  at  all  and  (n-1)  indicating  complete  connectivity.  The  connectivity  itself  is  generated 
randomly  to  satisfy  the  parameters  specified. 

The  above  setup  allows  one  to  perform  different  types  of  experiments.  For  each  query,  connectivity  can 
be  changed  to  determine  how  the  plan  cost  changes  and  can  also  compare  the  optimal  cost  with 
heuristics-based  plan  costs.  It  is  possible  that  due  to  the  connectivity,  a  number  of  plans  cannot  be 
completed  resulting  in  a  high  cost.  Queries  or  connectivity  sequences  can  be  changed,  independently,  by 
varying  the  corresponding  seed. 

6.7  Experimental  Results 

In  order  to  test  the  effectiveness  of  the  heuristics  proposed  for  the  query  plan  generator,  we  performed 
two  sets  of  experiments  that  tested  these  heuristics  across  different  connectivity  matrices  and  several 
different  queries. 

6.7. 1  Impact  of  Connectivity  on  Plan  Cost 

In  this  experiment,  we  generated  a  single  query  (shown  below)  and  computed  the  top  3  plan  cost  using 
the  heuristics  proposed  along  with  the  optimal  plan  cost  on  six  different  configurations  of  the 
connectivity  matrix.  We  had  to  keep  the  connectivity  large;  otherwise,  no  plan  was  generated.  Since  the 
connectivity  matrix  is  large  in  size,  we  do  not  show  it  here.  Instead,  we  display  the  configuration  file 
used  for  this  experiment. 

Query  1 :  target  2 

SELECT  *  FROM  UAV_2_DATA,  UAV_4_DATA,  UAV_6_DATA  WHERE 
(  (UAV_2_DATA.NODEID=76)  )  AND  ( (UAV_2_DATA . L0NG>=8 04 ) )  AND 
( (UAV_6_DATA.LONG<=540)  AND 

( (UAV_2_DATA.LAT=UAV_4_DATA.LAT) )  AND  (UAV_4_DATA . LONG=UAV  6_DATA . LONG) ) ; 
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Configuration  File: 

package  af re¬ 
public  interface  af rlConstants 
{ 

int  NUMBER_OF_QUERIES  =  1; 

//Number  of  queries  to  be  generated  by  the  random  queryGenerator 
String  FILE_NAME  =  MoutputFiles/aprl3_queries_expl . txt" ; 

//Name  of  the  file  the  generated  queries  will  be  stored  in 

String  NETWORK_FILE_NAME  =  MoutputFiles/network/aprl3_network_expl " ; 

//Name  of  the  outpur  file  the  connection  matrix  is  stored  in 
int  SEED  =  4406235; 

//Seed  for  query  generator 

int  NETWORK_DEGREE  =  11; 

//Number  of  connected  nodes 

int  NUM_NETWORKS  =  6; 

//number  of  connection  matrices  to  be  generated  per  query 
int  NETWORK_SEED  =  33152035; 

//seed  for  connection  matrix  generator 
int  NUM_NODES  =  13; 

//number  in  noes  in  connection  matrix 
int  TopKOptimal  =  3; 

//number  of  optimal  plans  to  display;  0  to  show  all 
int  TopKCumulativeCost  =  3; 

//Set  heuristic  to  K  value  to  keep  or  zero  for  inactive 
int  TopKIterationCost  =  3; 

//Set  heuristic  to  K  value  to  keep  or  zero  for  inactive 
int  TopKJoinType  =  9; 

//show  k  number  of  join  type  heuristic  applying  cumulative  heuristic  to  k/3  of  each  type 
boolean  displayNonConnective  =  false; 

//Set  to  true  to  display  non  connective  plans 
boolean  heuristicDebug  =  false; 

//set  to  true  to  dump  heuristic  execution  data  to  files 


The  results  of  the  experiment  are  shown  below. 
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Table  9  Costs  incurred  by  various  approaches 


Network 

Optimal 

Join 

Optimal 

Semi- 

Join 

TopK 

Cumulative 

Join 

TopK 

Cumulative 

Semi-Join 

TopK 

Iterative 

Join 

TopK 

Iterative 

Semi-Join 

TopK 

Join- 

type 

Join 

TopK 

Join-type 

SemiJoin 

1 

2.81 

29.01 

2.81 

2.81 

29.01 

2.81 

2 

7.80 

1.75 

10.88 

268.38 

1.75 

10.88 

1.75 

3 

1.47 

10.88 

268.38 

10.88 

11.73 

1.47 

4 

1.61 

13.58 

1.61 

1.75 

13.58 

1.61 

5 

1.01 

12.32 

1.01 

1.01 

12.32 

1.01 

6 

1.81 

10.65 

268.38 

1.47 

10.65 

2.02 

Effect  of  Heurisitcs  on  Different  Connectivities  for  Single  Query 
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TopKJoin-type  Join  TopK  Join-type  SemiJoin 


Figure  4:  Effect  of  Heuristics  on  different  connectivities 


The  graph  as  well  as  the  table,  shows  the  cost  (in  milliseconds)  incurred  by  the  various  approaches 
towards  generating  the  top-3  best  plans  for  the  given  query. 


Based  on  the  results,  it  can  be  observed  that  the  plan  generation  process  depends  heavily  on  the 
connectivity  between  nodes.  For  many  network  configurations,  no  plan  is  generated  even  in  optimal  join 
case.  However,  amongst  the  different  heuristics,  the  Semi-join  approaches  appear  to  be  suitable  for  the 
current  set  of  connectivity  configurations.  However,  determining  the  exact  relationship  between  the 
type  of  join  and  the  corresponding  costs  of  plan  generation  will  require  further  study. 

6. 7.2  Plan  cost  across  multiple  network  configurations  for  Different  Queries 

In  this  experiment,  we  generated  5  different  queries  (shown  below)  and  tested  the  three  heuristic 
algorithms  along  with  the  optimal  algorithm  on  a  same  configuration  of  the  connectivity  matrix.  Like  the 
previous  experiment,  we  display  the  configuration  file  used  for  this  experiment. 
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Query  1 :  target  2  SELECT  *  FROM  UAV_2_DATA,  UAV_4_DATA,  UAV_5_DATA  WHERE 
( (UAV_2_DATA.NODEID=66) )  AND ( (UAV_2_DATA . LONG>=61 4 ) )  AND  ( (UAV_5_DATA . NODEI D=7 7 ) )  AND 
( (UAV_2_DATA. LAT=UAV_4_DATA . LAT) )  AND  ( (UAV_4_DATA. NODEID=UAV_5_DATA. NODEID) ) ; 

Query  2:  target  5  SELECT  *  FROM  UAV_5_DATA,  UAV_10_DATA  WHERE  ( (UAV_1 0_DATA . LAT=609 ) ) 
AND  ( (UAV_10JDATA.OBJPTR<=246) )  AND  ( (UAV_5_DATA . OB JPTR=UAV_1 0_DATA . OB JPTR) ) ; 

Query  3:  target  9  SELECT  *  FROM  UAV_9_DATA,  UAV_10_DATA,  UAV_5_DATA  WHERE 
( (UAV_9_DATA.LONG>351) )  AND  ( (UAV_9_DATA. LAT>=40) )  AND  ( ( UAV_5_DATA . LONG<=8 04 ) )  AND 
( (UAV_9_DATA.OBJPTR=UAV_10_DATA.OBJPTR) )  AND  ( ( UAV_1 0_DATA . LAT=  UAV_5_DATA. LAT) ) ; 

Query  4:  target  6  SELECT  *  FROM  UAV_6_DATA,  UAV_10_DATA,  UAV_4_DATA  WHERE 
( (UAV_6_DATA.LAT<55) )  AND  ( (UAV_6_DATA . NODEID<=2 6 0 ) )  AND  ( (UAV_4_DATA . NODEI D=22 ) )  AND 
( (UAV_6_DATA. TIMESTAMP=UAV_10JDATA. TIMESTAMP) )  AND  (UAV_1 0_DATA . OB JPTR= 
UAV_4_DATA.OBJPTR)  )  ; 

Query  5:  target  9  SELECT  *  FROM  UAV_9_DATA,  UAV_3_DATA,  UAV_2_DATA,  UAV_4_DATA  WHERE 
( (UAV_9_DATA.TIMESTAMP<=764) )  AND  ( (UAV_9_DATA . L0NG<1 02 ) )  AND  ( (UAV_2_DATA . NODEID=6 6 ) ) 
AND  ( (UAV_2_DATA.LONG>=614) )  AND  ( ( UAV_9_DATA . LAT=UAV_3_DATA . LAT ) )  AND 
( (UAV  3_DATA . LAT=UAV_2  DATA. LAT))  AND  ( (UAV  2  DATA . OB JPTR=UAV_4_DATA . OB JPTR) ) ; 


Configuration  File: 

package  af re¬ 
public  interface  af rlConstants 
{ 

int  NUMBER_OF_QUERIES  =  10; 

//Number  of  queries  to  be  generated  by  the  random  queryGenerator 

String  FILE_NAME  =  "outputFiles/aprl3_queries_exp2 . txt" ; 

//Name  of  the  file  the  generated  queries  will  be  stored  in 

String  NETWORK_FILE_NAME  =  "outputFiles/network/apr 13_network_exp2 " ; 

//Name  of  the  outpur  file  the  connection  matrix  is  stored  in 
int  SEED  =  4406235; 

//seed  for  query  generator 

int  NETWORK_DEGREE  =  11; 

//Number  of  connected  nodes 

int  NUM_NETWORKS  =  1; 

//number  of  connection  matrices  to  be  generated  per  query 
int  NETWORK_SEED  =  32152035; 

//seed  for  connection  matrix  generator 
int  NUM_NODES  =  13; 

//number  in  noes  in  connection  matrix 
int  TopKOptimal  =  3; 

//number  of  optimal  plans  to  display  0  to  show  all 
int  TopKCumulativeCost  =  3; 

//Set  heuristic  to  K  value  to  keep  or  zero  for  inactive 
int  TopKIterationCost  =  3; 

//Set  heuristic  to  K  value  to  keep  or  zero  for  inactive 
int  TopKJoinType  =  9; 

//show  k  number  of  join  type  heuristic  applying  cumulative  heuristic  to  k/3  of  each  type 
boolean  displayNonConnective  =  false; 

//Set  to  true  to  display  non  connective  plans 
boolean  heuristicDebug  =  false; 

//set  to  true  to  dump  heuristic  execution  data  to  files 

} 
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The  results  of  the  experiment  are  shown  below. 


Table  10  Costs  incurred  by  various  approaches  for  Top  K  Plans 


Query 

Optimal 

Join 

Optimal 

Semi-join 

TopK 

Cumulative 

Join 

TopK 

Cumulative 

Semi-Join 

TopK 

Iterative 

Join 

TopK 

Iterative 

Semi-Join 

TopK  Join- 
type  Join 

TopK  Join- 
type  Semi- 
Join 

Query  1 

6.19 

1.19 

172.00 

7.91 

6.19 

1.19 

172.00 

7.91 

Query  2 

52.34 

6.41 

4853.07 

390.31 

6.41 

4853.07 

390.31 

Query  3 

60.76 

125.39 

8557.31 

785.46 

60.76 

125.39 

60.76 

125.39 

Query  4 

0.89 

1.14 

14362.78 

1.21 

1.34 

16201.62 

1.12 

1.21 

Query  5 

0.52 

0.87 

0.70 

1.29 

0.62 

1.95 

0.70 

1.29 

Effect  of  Heurisitcs  on  Different  Queries  on  a  Single  NetwofJ*  o 
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■  Optimal  Join  A  Optimal  Semi-join  BTopK  Cumulative  Join 

■TopK  Cumulative  Semi-Join  a  Top K  Iterative  Join  TopK  Iterative  Semi-Join 

Figure  5  Effect  of  Heuristics  on  different  queries 

The  graph  as  well  as  the  table,  shows  the  cost  (in  milliseconds)  incurred  by  the  various  approaches 
towards  generating  the  top-3  best  plans  (average)  for  5  different  queries  on  the  same  connectivity 
matrix  configuration. 

Based  on  the  results,  it  can  be  observed  that  the  plan  generation  process  depends  hugely  on  the 
connectivity  between  the  nodes.  Similar  to  the  previous  experiments,  amongst  the  different  heuristic 
approaches,  the  Semi-join  approaches  appear  to  be  suitable  for  the  current  set  of  connectivity 
configurations. 

As  connectivity  plays  a  critical  role  even  for  obtaining  a  plan  (not  necessarily  a  good  plan),  replication  becomes 
even  more  important.  We  have  assumed  single-copy  replication  in  this  work.  This  needs  to  be  extended  to  multiple 
copies.  Also,  it  becomes  important  to  include  predicted  connectivity  into  plan  generation  heuristics.  Currently,  it  is 
not  included.  A  different  set  of  heuristics  bases  on  stability  of  the  connections,  multiple  replicas  (both  partial  and 
complete)  seem  to  be  important.  Also,  parallel  pan  execution  and  dynamic  plan  generation  (postponing  the  use  of 
connectivity  information  until  needed)  seems  to  be  critical  for  this  scenario. 
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7. 


REPLICATION  AND  NETWORK  MANAGEMENT 


Message  format  for  different  types  of  messages  is  shown  below  in  Figure  6.  Source  ID  and 
destination  ID  are  5-bit  unique  addresses  of  the  corresponding  nodes  (UAVs).  Multiple 
flags  are  used  to  indicate  the  type  of  data  in  the  payload. 


SP 

SP 
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Q 

Source  ID 

Destination  ID 

Timestamp 

Payload 

Figure  6  :  Message  Format 


Table  10:  Interpretation  of  flags 


Flag  bit 

Length 

Payload 

Content 

Description 

D-Discover 

1  bit 

Meta  data 

Message  to  discover  new  nodes  in  the  network. 

N-Neighbor 

1  bit 

Meta  data  and 
Adjacency  matrix 

Message  to  exchange  connectivity  information  between 
various  nodes  in  the  network. 

I-Direction 

1  bit 

Node  Direction 

Message  to  exchange  the  direction  of  travel  of  a  node. 

M-Data 

2  bits 

Result  of  query 
execution 

M=l:  Message  is  exchanged  directly  with  the  destination 
node  as  a  part  of  the  MOVE  operation. 

M  =11:  Message  is  exchanged  directly  with  the 
destination  node  as  a  part  of  the  COPY  operation. 

M  =10:  Message  is  sent  through  a  series  of  intermediate 
nodes  to  the  destination  node. 

A-Alive 

1  bit 

No  Payload 

Message  to  indicate  a  node's  presence  to  its  neighbors. 

Q-Quit 

1  bit 

No  Payload 

A  node  moving  out  of  range  sends  this  message  to  its 
neighboring  nodes. 

Setup 


Initially,  each  node  is  by  itself  and  not  connected  to  other  nodes  in  the  network.  At  time 
T=tl  (say)  when  another  node  comes  within  communication  range,  a  neighbor  discovery 
message  DISCOVER  is  sent  by  one  or  both  the  nodes.  The  DISCOVER  message  as  shown  in 
Figure  7comprises, 

•  The  source  identifier  (ID) 

•  The  destination  ID  is  typically  a  Broadcast  address 
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•  A  Timestamp 

•  Meta  Data  information 

•  D  =1,  indicating  that  the  payload  contains  Meta  data  information. 
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Destination  ID 

Timestamp 

Payload 

Figure  7:  DISCOVER 


Each  node  receiving  the  DISCOVER  message,  responds  with  a  NEIGHBOR  message  as 
shown  in  Figure  8.  The  NEIGHBOR  message  contains, 

•  The  source  identifier  (ID) 

•  The  destination  ID  is  typically  a  Broadcast  address 

•  A  Timestamp 

•  N  =1,  indicating  that  the  payload  contains  information  about  the  node’s 
connectivity  as  well  as  metadata  information. 

•  1=1,  indicating  that  the  payload  contains  a  node’s  intended  direction  of  travel 
as  shown  in  Figure  9. 
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Figure  8:  NEIGHBOR 
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Figure  9:  DIRECTION 


A  series  of  DISCOVER/NEIGHBOR  messages  among  nodes  within  communication  range 
results  in  an  ad  hoc  network  comprising  connected  nodes  (UAVs).  Similar 
DISCOVER/NEIGHBOR  messages  exchanged  periodically  either  to  gain  new  or  retain 
existing  connections. 

Each  node  in  the  ad  hoc  network  has  information  about  both  the  connectivity  as  well  as 
metadata  of  each  of  its  neighboring  node. 

Each  node  maintains  its  connectivity  with  other  nodes  in  a  connectivity  graph  Gn,  and 
exchanges  connectivity  information  (The  local  Gn)  periodically  with  its  neighbors.  It  is 
likely  that  there  are  one  or  more  such  ad  hoc  networks  and  isolated  nodes  in  the  system  at 
any  given  time.  Gn  is  constructed  through  a  series  of  DISCOVER/NEIGHBOR  message 
exchanges  among  nodes.  Initially,  each  node  starts  as  an  isolated  node  and  adds  neighbors. 
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Algorithm  executed  at  each  node  after  DISCOVERY 
DISCOVER_NEIGHBOR 

{  At  node  TV/ 

if  received  DISCOVER  from  Node  Nj 

Update  adjacency  list  at  Node  iV, 

For  each  node  Nk  in  the  adjacency  list  of  node  iV, 

N=  1 

Send  NEIGHBOR  message  (Updated  adjacency  list,  Meta  data 
information) 

} 


Data  exchange  between  nodes 

Consider  a  snap  shot  of  the  system  shown  in  Figure  lo.  Let  Ni,  N2,  N3  be  three  nodes  in  the 
physical  network.  Data  items  (relations)  R1  and  R2,  R3  are  represented  as  data  nodes  and 
reside  on  nodes  Ni,  N2  and  N3  respectively. 


Figure  10:  Network  snapshot 
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Each  physical  node  in  the  network  exchanges  information  about  what  Relational  data  it  contains  as  a 
part  of  meta  data  exchange  which  is  included  in  the  DISCOVER  message.  Hence  node  N:  knows  that 
nodes  N2  and  N3  contain  data  items  R2  and  R3  respectively.  Similarly  the  other  two  nodes  get  to  know 
that  node  N3  contains  data  item  D2. 

When  a  query  arrives  at  node  Nh  a  query  plan  is  generated  based  on  the  current  connectivity  of  the 
network.  The  generated  plan  contains  operations  that  need  to  be  executed  at  various  nodes  with  the 
query  execution  beginning  at  N3.  Operations  local  to  the  node  N:  are  carried  out  first  and  when  the  first 
MOVE  command  is  executed,  the  results  of  the  partial  query  execution  needs  to  be  transferred  to  the 
next  physical  node  in  the  plan. 

The  flag  M  is  set  to  indicate  that  the  payload  contains  the  following  as  shown  in  Figure  11: 

Result  of  the  local  operations  at  node  NP 
The  query  plan 

A  counter  value  to  indicate  that  the  next  node  in  sequence  needs  to  execute  the  query  starting  at  the 
instruction  pointed  to  by  the  counter  value. 


SP 

SP 

D 

N 

1 

1 

A 

Q 

Nr 

n2 

Timestamp 

Queryplan,Resi, Counter 

Figure  11:  Data  Message 


For  example: 

Let  the  query  Q  that  was  issued  to  node  N:  be  Q=  nAjB(ocl<10  (Ri))  MnB(oc2=20  (R2)) 

From  the  above  mentioned  figure,  Rj  is  on  Node  N2  and  R2  is  on  node  N2. 

Let  the  result  at  Node  N3  be  Res2  =  nA/B(ocl<io  (Ri)) 

Resx  along  with  the  query  plan  and  the  counter  needs  to  be  sent  to  Node  N2  for  further  execution  of  Q. 
At  Node  N2 the  following  query  is  executed  to  give  the  final  result  of  query  Q. 

Res2=Res1MnB(oc2=20  (R2)) 

Algorithm  to  execute  MOVE 
MOVE  (Resi,  Node  N„  Node  Nj ) 

{  Node  Nr.  The  node  where  the  result  of  the  first  part  of  the  query  Q  was  executed. 
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Node  Njm.  The  next  node  in  the  execution  sequence. 

At  node  N, 

If  Eij  exists;  Eij  is  the  edge  between  iV,  and  Nj 

M=1 

Transfer  Resi  from  iV,  to  Nj 
Delete  Result  Resi  from  iV, 

If  path  Pij  exists;  Pyis  the  path  from  iV,  to  Nj 

M=10 

Transfer  Resi  from  TV/  to  next  node  in  path  Pij 
Delete  Result  Resi  from  iV, 

else 

MOVE  cannot  be  executed. 

At  node  Nk'.c an  be  intermediate  node  or  destination  node 
If  DestID=SelfID  copy  Resi. 

If  DestlD^SelflD  transfer  Resi  to  next  node  in  Pij 

} 


The  COPY  operation  executed  at  a  node  as  specified  by  the  query  plan  indicates  that  the 
result  of  query  execution  at  that  particular  node  needs  to  be  sent  to  a  set  of  other  nodes. 

The  flag  M  is  set  to  11  and  the  payload  contains  the  Result  as  per  Figure  12. 


SP 

SP 

D 

N 

I 

11 

A 

Q 

Ni 

n2 

Timestamp 

Result 

Figure  12:  Message  sent  for  the  COPY  operation 
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Algorithm  to  execute  COPY 

COPY  (Result,  NodeList[  ]) 

{  A/,:  Node  at  which  the  copy  operation  is  executed. 

NodeList  [ ]:  Contains  a  list  of  nodes  to  which  the  Result  needs  to  be  copied. 
At  Node  A/,- 

for  each  node  Nj  in  NodeList  [ ] 

If  Ejj exists;  E,y  is  the  edge  between  A/,  and  Nj 

M=ll 

Transfer  Result  from  A/,  to  Nj 

If  P;,  exists;  P,y  is  the  path  between  A/,  and  Nj 

M=10 

Transfer  /?es3  from  N,  to  next  node  in  path  P,y 
else 

COPY  cannot  be  executed  for  node  Nj 

At  node  A4;can  be  intermediate  node  or  destination  node 

If  DestlD=SelflD  copy  ReS]_, 

If  DestlD^SelflD  transfer /?esi  to  next  node  in  P,y 
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7.1  Message  exchange  for  adjacency  list  maintenance  at  each  node 

Each  node  to  indicate  its  presence  in  the  network  sends  periodic  ALIVE  message  to  those 
not  connected  to  it.  This  message  is  essentially  a  multicast  message.  The  flag  A  is  set  to 
indicate  that  the  message  is  an  ALIVE  message.  A  node’s  absence  is  determined  due  to  the 
missing  ALIVE  message  and  changes  are  made  in  the  connectivity  graph  of  each  of  the 
neighbors. 

Also,  a  node  can  voluntarily  leave  a  network  by  sending  a  QUIT  message  (also  multicast) 
and  hence  changes  are  made  in  the  connectivity  graph  of  each  of  the  neighbors.  The  flag  Q 
is  set  to  indicate  that  the  node  is  voluntarily  leaving  the  network.  A  typical  sequence  of 
message  exchanges  between  two  nodes  in  shown  in  Figure  13. 

Algorithm  to  reconstruct  the  Adjacency  list 
RECONSTRUCT_ADJLIST 

{  At  node  TV/ 

If  no  ALIVE  message  is  received  from  Node  Nj  or  QUIT  is  received  from  TV/ 

Remove  node  TV,  from  the  adjacency  list  of  node  TV, 

For  each  node  Nk  in  the  adjacency  list  of  node  iV, 

N=1 

Send  NEIGHBOR  message  (updated  adjacency  list,  Meta  data 
information) 

} 


Sequence  of  message  exchange  between  nodes  in  the  network. 
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Time 


Figure  13:  Message  exchange  between  nodes  from  T=0  to  T=/V units 
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7.2  Replication 


Each  data  object  is  replicated  on  other  node.  In  the  following  we  discuss  the  criteria  for  choosing  an 
appropriate  node  for  replication. 

Assumptions: 

There  exists  only  one  replica  of  a  given  data  item  (Tuples). 

Ns( Source  Node):  The  node  where  the  original  copy  of  data  item  d j-  exists. 

A/c(Candidate  Node):  The  node  which  contains  the  replica  of  d ,. 

When  a  Source  node  Ns  decides  to  replicate  its  contents  on  another  node  Nc,  a  node  from  the 
set  of  the  Nodes  that  are  immediate  neighbors  of  the  source  node  is  selected  as  candidate  node 
for  replication.  Immediate  neighbors  are  those  nodes  which  are  directly  connected  to  the  source 
node.  The  source  node  tries  as  much  to  replicate  all  its  tuples  on  the  chosen  candidate  node. 

For  each  of  the  above  selected  candidate  nodes,  a  cost  function  CS/C  is  computed.  The  node  with 
the  lowest  cost  is  selected  as  a  candidate  for  replication. 

The  cost  function  to  determine  the  candidate  node  for  replication  is  dependent  on  the  following  factors: 

Bandwidth:  Defines  the  closeness  of  /Vcfrom  Ns  in  terms  of  bandwidth.  Greater  the  bandwidth  more 
data  can  be  transmitted  within  a  given  time.  This  is  advantageous  when  there  are  several  tuples  to  be 
sent  across  the  link  in  a  single  update  operation. 

Link  stability:  This  is  a  measure  of  stability  of  the  link  between  nodes  Ns  and  Nc.  Greater  stability  of  the 
link  between  the  two  nodes  implies  better  longevity. 

The  Link  stability  LSsc  measured  at  NS/  is  equal  to  rate  of  change  of  received  signal  strength  for  messages 
sent  by  Nc.  Let  r±  and  r2  be  the  received  signal  strength  at  time  t2  and  t2  respectively.  (t^t^Current  Link 
stability  between  Ns  and  N0  is  given  by, 

dr 

LSsc=  — , Where  dr  =r2-rjand  dt=t2-t2.  If  r^r^then  LSsc  is  positive,  this  indicates  increasing 
dt 

link  stability  between  Ns  and  Nc .  If  r2<r2  then  LSsc  is  negative,  this  indicates  decreasing  link 
stability  between  Ns  and  Nc. 

Degree  of  the  node  Nc:  Greater  the  degree  of  a  node,  better  is  the  accessibility  of  replicated  data. 
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7.2.1 


Cost  function 


When  the  source  node  Ns  decides  to  replicate  its  data  item  d,  on  a  candidate  node 

Table  11:  Cost  function  parameters  maintained  at  each  node  Ns 


Node 

Average  Bandwidth 

Average  Degree 

N 1 

Si 

Di 

n2 

b2 

02 

N, 

B, 

0/ 

Nc,  it  takes  into  account  the  recent  history  (from  the  last  replication  decision)  to  make  the  replication 
decision.  History  includes  information  about  the  above  mentioned  three  factors.  A  simple  average  is 
computed  for  each  of  Bandwidth  and  Degree.  A  table,  such  as  Table  11  is  maintained  to  store  the 
average  values  for  each  node  the  source  node  is  connected  to.  Each  row  in  the  above  table  can  be 
considered  as  a  vector  of  values.  Each  element  in  this  vector  is  normalized  and  has  a  value  between  0 
and  1. 

For  each  candidate  node,  a  table  is  maintained  to  store  the  RSS  values  recorded  in  time.  At  any  point  in 
time,  the  Link  stability  can  be  determined  using  2  consecutive  values  from  the  Table  12.  (t  1<t2<ti) 


Table  12:  RSS  values 


Time 

Relative  Signal  Strength 

ti 

C i 

t2 

r2 

t. 

n 

Consider  the  following  example: 

Let  the  current  value  of  Bandwidth  between  nodes  /V2  and  N2  be  500  kpbs  and  the  maximum  value  of 
bandwidth  be  1Mbps.  The  normalized  value  of  bandwidth  is  500kbps/lMbps,  which  is  equal  to  0.5. 
Similarly,  normalization  is  applied  to  degree. 

For  link  stability  normalization  is  applied  as  follows: 

If  the  node  Ns  decides  to  replicate  it's  tuples  at  say  time  t,-,  it  takes  into  account  the  current 
value  of  RSS  r,-  (time  t,)and  the  previous  value  retime  t;_i)The  current  Link  stability  of  the  link 
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between  Ns  and  Nc  is  equal  to  the  normalized  value  of  rate  of  change  of  RSS.  Let  T  be  the  time 
period  between  the  last  replication  decision  and  the  current  replication  decision.  The  maximum 
value  of  rate  of  change  of  RSS  during  T  is  computed  which  is  used  in  the  normalization. 


Link  stability  LS'sc: 


dr  /  dt 


MAX 


'dr\ 

y  dt  j 


The  above  value  lies  in  the  range  [-1,  1], 

Normalized  link  stability  is  given  by:  (LS'S/C+a)/ (3  where  a=l  and  (3=2.The  normalized  value  has  a 
range  [0,1]. 

After  each  parameter  is  normalized,  a  weight  is  associated  with  it.  The  weight  is  an  indication  of 
the  relative  importance  of  a  parameter  with  respect  to  other  parameters  in  the  computation  of 
the  cost  function.  Let  wb,  tv/and  tvdbe  the  weights  assigned  for  each  of  average  bandwidth,  link 
stability  and  average  degree  respectively. 

Hence  the  cost  function  CS/C  to  identify  the  most  suitable  node  for  replication  is  given  by, 

CS/C= 1/ (  tvfa*average  bandwidth  +  tv;*link  stability  +  wd*  average  degree) 


Also,  wb+  Wi  +  wd  =1. 


Lower  the  value  of  CS/C  for  a  candidate  node  Nc, indicates  it  is  more  suitable  for  replication. 
Weights  for  wb,  tv/and  wd:  The  range  of  values  for  each  of  wb  and  tv/is  [0,0.4]and  wd  is  [0,0.1], 
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Algorithm  to  replicate 

A  node  decides  to  replicate  its  tuples  after  each  time  period  T'. 

Algorithm  to  Check  Connectivity 

CONNECTIVITY  (connectivity  matrix,  Ns) 

{  for  each  Node  Nj  in  connectivity  matrix  of  node  Ns 
If  ESj exists;  ESj  is  the  edge  between  Ns  and  Nj 
Return  Nj 
Else 

Return  null 

} 


Justification  for  the  parameters  used  in  the  cost  function. 

The  cost  function  is  given  by: 

CS/C=l/(  tvfa*average  bandwidth  +  tv,*link  stability  +  wd*  average  degree) 

The  network  in  this  scenario  is  dynamic  and  the  connectivity  between  nodes  (physical  nodes)  keeps 
changing  very  often.  The  above  chosen  parameters  in  the  cost  function  are  ideal  to  select  a  replica. 

Bandwidth: 

During  the  process  of  replication,  depending  on  how  much  data  the  source  node  has  acquired,  the 
amount  of  data  that  needs  to  be  transferred  (during  a  replication  period)  to  the  candidate  node  can  be  a 
lot.  Also,  taking  into  consideration  the  dynamic  nature  of  the  network,  delays  in  transmission  can  lead  to 
some  data  not  being  transferred  to  the  candidate  node.  If  the  candidate  node  is  queried  (instead  of  the 
source  node)  the  result  of  the  query  may  not  be  accurate  due  to  lack  of  tuples  necessary  for  carrying  out 
the  execution  of  the  query.  Bandwidth  plays  an  important  role  in  transmission  delay,  since  greater  the 
bandwidth  more  data  can  be  transmitted  on  the  link  within  less  time.  If  the  source  node  and  the 
candidate  node  are  connected  and  the  bandwidth  between  them  is  high,  data  can  be  replicated  quickly 
on  the  candidate  node. 
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Degree: 


If  the  source  node  of  a  data  item  goes  out  of  range  (isolated)  for  a  long  period  of  time,  any  query 
involving  data  on  source  node  cannot  be  successfully  executed.  If  a  node  'Nc'  is  connected  to  several 
other  nodes,  data  on  'Nc'  will  be  available  to  other  connected  (direct  or  indirect)  nodes  for  a  longer 
period  of  time.  It  is  essential  that  the  replica  of  a  data  item  (tuples)  be  accessible  when  the  actual  source 
of  data  goes  out  of  range.  If  the  candidate  node  has  a  low  degree  then  the  chances  of  it  going  out  of 
range  is  greater.  Hence  degree  of  a  node  plays  an  important  role  in  the  availability  of  data. 

Link  Stability: 

Link  stability  is  a  measure  of  a  source  node's  connectivity  with  the  candidate  node.  Greater  the  Link 
stability  between  2  nodes,  the  more  stable  is  the  connectivity  between  the  two  nodes.  A  source  node 
'Ns'  always  replicates  its  data  items  (tuples)  on  a  single  candidate  node  'Nc'.  If  the  connectivity  between 
'Ns'  and  'Nc'  keeps  changing  very  often  (highly  dynamic)  and  ‘Ns’  has  acquired  new  data  which  it  decides 
to  replicate  on  'Nc',  replicating  data  becomes  hard  and  unmanageable.  If  the  link  stability  between  2 
nodes  is  high,  the  source  node  gets  more  time  to  replicate  its  data.  When  a  query  hits  a  candidate  node, 
it  will  have  the  most  recent  data  (since  the  source  node  can  continuously  transmit  the  recently  acquired 
tuples)  required  for  successful  query  execution. 
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Example  Network 


Figure  14  A  sample  graph  of  14  nodes  in  the  system 


B  =  1  Mbps,  B/2=500Kbps,  B/3=333kbps,  B/4=250kbps 

Plots 


W|  =0.4,  Wb=0.4,  Wd=0.2 
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In  the  above  graph  if  Node  N1  decides  at  t6  to  replicate  its  tuples  on  one  of  N2,  N6  or  N14,  the  best 
node  in  terms  of  cost  (lowest)  is  chosen  for  replication. 

The  cost  calculation  for  each  of  these  nodes  is  shown  below: 

Cost  for  replicating  on  node  N2 

Table  13  RSS  Values  from  tl  to  t5  for  Node  2 


tl 

t2 

t3 

t4 

t5 

t6 

5 

7 

4 

8 

6 

5 

(Maintained  at  Nl) 


Table  14  Parameter  values  for  Node  2 


Current  Value 

Maximum  Value 

Normalized  value 

Bandwidth 

333kbps 

1Mbps 

0.33 

Degree 

2 

14 

0.14 

Link  stability 

-1 

4 

0.37 

Cost  =  1/  ((0.33*0.4)  +  (0.37*0.4)  +  (0. 14*0. 2))=l/0. 30=3.33 
Cost  for  replicating  on  node  N6 

Table  15  Parameter  values  for  Node  6 


Current  Value 

Maximum  Value 

Normalized  value 

Bandwidth 

333kbps 

1Mbps 

0.33 

Degree 

2 

14 

0.14 

Link  stability 

1 

3 

0.66 
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Table  16  RSS  Values  from  tl  to  t5  for  Node  6 


tl 

t2 

t3 

t4 

t5 

t6 

5 

7 

6 

4 

7 

8 

(Maintained  at  Nl) 


Cost  =  1/  ((0.33*0.4)  +  (0.66*0.4)  +  (0.14*0.2))=l/0.42=2.35 

Cost  for  replicating  on  node  N14 

Table  17  Parameter  values  for  Node  14 


Current  Value 

Maximum  Value 

Normalized  value 

Bandwidth 

333kbps 

1Mbps 

0.33 

Degree 

1 

14 

0.07 

Link  stability 

-1 

3 

0.33 

Table  18  RSS  values  from  tl  to  t6  for  Node  N14 


tl 

t2 

t3 

t4 

t5 

t6 

5 

7 

6 

4 

7 

6 

(Maintained  at  Nl) 


Cost  =  1/  ((0.33*0.4)  +  (0.33*0.4)  +  (0. 07*0. 2))=l/0. 27=3.70 

Table  19:  Cost  for  nodes  N2,  N6  and  N14 


Bandwidth 

Degree 

Link  Stability 

Cost 

Node  N2 

0.33 

0.14 

0.37 

3.33 

Node  N6 

0.33 

0.14 

0.66 

2.35 

Node  N14 

0.33 

0.07 

0.33 

3.7 

From  the  above  table,  Node  Nl  chooses  node  N6  as  the  best  node  for  replication  since  the  cost  for 
replication  is  lesser  than  the  other  2  nodes. 
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7.3  Application  Level  mechanism  for  Acknowledging  UDP  packets 


Issues  with  TCP  over  mobile  wireless  network:  The  nodes  in  the  scenario  communicate  with  each  other 
over  wireless  links.  TCP  does  not  scale  well  such  wireless  networks  as  nodes  frequently  disconnect  and 
rejoin.  It  will  be  necessary  to  use  UDP  for  transfer  of  messages/metadata  between  nodes  in  the  current 
environment.  However,  for  critical  applications,  reliable  transfer  of  packets  is  imperative.  In  view  of  the 
above,  we  have  developed  a  simple  mechanism  for  acknowledging  UDP  packets  from  the  application 
layer.  Development  of  the  simple  mechanism  was  preceded  by  a  review  of  existing  literature  on  the 
topic.  Summaries  of  critiques  will  be  provided  if  needed. 


1.  Wireless  links  use  open  air  as  the  transmission  medium  and  are  subject  to  many 
uncontrollable  quality-affecting  factors  such  as  weather  conditions,  urban  obstacles, 
multipath  interferences,  large  moving  objects,  and  mobility  of  wireless  end  devices.  As  a 
result,  wireless  links  exhibit  much  higher  Bit  Error  Rate  (BERs)  than  wired  links. 

2.  Limitations  of  radio  coverage  and  node  mobility  require  frequent  handoffs,  thus 
resulting  in  temporal  disconnections  and  reconnections  between  the  communicating 
end  hosts  during  a  communication  session.  During  a  disconnection  both  the 
retransmission  and  ACK's  will  be  lost,  repeated  failed  retransmissions  results  in  stalling 
the  TCP  transmission  for  a  much  longer  period.  (Slow  start) 


UDP  data  transfer  scheme: 

The  User  Datagram  Protocol  provides  a  procedure  for  application  programs  to  send  data  to  other 
applications  with  minimum  protocol  mechanism.  The  protocol  is  transaction  oriented,  and  delivery  and 
duplicate  protection  are  not  guaranteed.  However  UDP  can  be  used  to  achieve  high  speed  as  well  as 
accurate  results  by  providing  an  acknowledgement  scheme  in  the  application  layer.  Unlike  the  TCP 
protocol  where  the  acknowledgement  is  sent  after  each  packet  is  sent,  by  using  the  UDP  scheme  the 
acknowledgement  for  the  data  received  can  be  sent  only  after  a  block  of  data  is  received.  This  reduces 
the  number  of  acknowledgements  sent  when  compared  to  the  TCP  scheme  hence  increases  the  overall 
speed. 

The  structure  of  a  UDP  frame  includes: 


Source  Port 

Destination  Port 

Length 

Checksum 

Data  Octets 
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Application  layer  acknowledgement  scheme  based  on  UDP: 


In  the  application  layer  at  the  sender  side,  an  INFO  message  which  contains  information  about  the 
number  of  packets  and  the  number  of  tuples  is  sent  to  the  receiver  before  the  actual  data  is 
transmitted. 


The  format  of  the  INFO  message  is: 


SP 

F 

D 

N 

I 

M 

A 

Q 

Source  ID 

Destination  ID 

Timestamp 

Payload 

The  flag  F  is  set  (F=l)  to  indicate  that  the  payload  contains  frame  related  information.  The  payload 
contains  information  about  the  number  of  packets  that  would  follow  ('K')  and  the  number  of  tuples  in 
each  of  the  packets  ('k').  This  information  lets  the  receiver  know  about  the  amount  of  data  to  expect 
from  the  sender  identified  by  the  Source  ID  field.  The  receiver  allocates  some  buffer  space  to 
accommodate  the  incoming  data. 

After  the  INFO  packet  has  been  transmitted,  the  sender  transmits  the  actual  data.  Each  application  layer 
data  packet  (those  which  have  the  flag  set)  contains  information  about  the  ID  for  the  transaction  and 
the  sequence  number  "Seq"  of  the  packet  being  sent.  The  ID:Seq  field  for  the  other  packets  has  a  value 
of  0  to  indicate  that  the  message  contains  control  packets. 


SP 

ID: Seq 

F 

D 

N 

I 

1 

A 

Q 

Source  ID 

Destination  ID 

Timestamp 

Payload 

The  receiver  maintains  2  timers  ACK  and  NAK  which  are  for  acknowledging  and  reporting  loss 
respectively.  These  indicate  the  frequency  of  acknowledgment  (or  NAK)  in  the  network.  If  the  ACK  timer 
expires  and  there  are  new  packets  to  be  acknowledged,  the  receiver  sends  an  ACK  to  the  sender.  The 
receiver  maintains  a  loss  list  which  maintains  information  about  the  number  of  packets  that  were  lost 
during  transmission.  If  the  current  sequence  number  is  greater  than  the  largest  sequence  number  ever 
received  plus  1,  all  the  sequence  numbers  between  these  two  numbers  will  be  inserted  into  the 
receiver's  loss  list.  If  the  NAK  timer  expires  and  the  loss  list  at  the  receiver  is  nonempty  then  the  receiver 
sends  a  NAK  to  the  sender. 

On  receiving  a  NAK,  the  sender  retransmits  the  packet.  Until  an  ACK  is  received  the  sender  maintains 
information  about  all  the  packets  that  have  been  sent  in  this  session  (indicated  by  ID)  in  the  sender's 
ToBeACK  list.  On  receiving  acknowledgement  report  for  a  particular  packet  (or  set  of  packets)  from  the 
receiver,  information  about  these  packets  is  removed  from  the  ToBeAck  list. 

The  format  of  the  NA  packet  is: 


NA 

ID: Seq 

F 

D 

N 

I 

M 

A 

Q 

Source  ID 

Destination  ID 

Timestamp 

Payload 
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The  field  NA  is  set  to  10  to  indicate  a  loss  report.  The  payload  contains  information  about  the  packet 
numbers  not  yet  received.  The  field  NA  is  set  to  11  which  indicate  the  acknowledgment  till  the  last 
correctly  received  packet. 

Algorithm  for  sending  data  (at  the  sender): 

The  algorithm  "Send"  transmits  data  "Data"  to  the  receiver  "Destination  ID". 

Send  (Data,  Destination  ID)  { 

Send  control  information  by  setting  the  flag  F=1  of  the  INFO  message  to  the  receiver. 

Send  data  (tuples)  using  UDP  to  receiver  (Destination  ID) 

Add  the  ID:  Sequence  number  in  the  ToBeACK  list. 

On  receiving  an  ACK,  remove  the  Sequence  number  (s)  from  the  ToBeACK  list. 

On  receiving  a  NAK,  retransmit  the  packet(s)  to  the  receiver. 

} 

Algorithm  for  receiving  data  (at  the  receiver): 

The  Receive  algorithm  is  executed  at  the  receiver. 

Receive  (Data,  Sender  ID,  Destination  ID)  { 

After  receiving  the  INFO  message,  start  the  ACK  and  NACK  timer. 

Receiving  packet:  if  Sequence  number  >  Sequence  number  in  buffer  +1 
Add  packet  to  loss  list. 

If  NAK  expires  and  loss  list  is  nonempty 

Send  NAK  report  to  sender 

If  ACK  expires  and  there  are  pending  acknowledgements  to  be  sent  to  the  Sender 
Send  an  ACK  report  to  the  sender. 
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8.  SIMULATION  STUDIES 


8.1  Simulation  Architecture 

Technical  Specifications: 


Table  20  Specifications  for  Simulation 


MODEL 

TECHNOLOGY 

Front  end 

JAVA  AWT 

Back  end 

ORACLE  10G 

Controller 

JDBC 

Languages 

JAVA,  PL/SQL 

SDE 

ECLIPSE 

8.1.1  Description  or  Setup: 

A  mobile  Peer-to-Peer  network  has  been  set  up  with  a  maximum  of  12  UAV's.  All  the  UAV's  are 
within  an  area  of  140  X  140  miles  (area  of  20,000  sq.  miles)  distance. 

Each  UAV  sends  a  DISCOVER  message  as  soon  as  it  enters  the  network.  UAV's  within  the 
transmission  range  receive  this  DISCOVER  message  and  acknowledge  it  by  sending  a  NEIGHBOR  message 
back.  A  links  is  then  created  between  these  UAV's.  UAV's  also  keep  sending  DISCOVER  message 
periodically  to  find  new  UAV's  reachable. 

An  ALIVE  message  is  sent  periodically  by  all  the  UAV's  to  indicate  their  presence  in  the  network.  An 
ALIVE  message  also  contains  the  connectivity  graph  of  the  UAV. 

Bandwidth  of  the  link  is  1Mbps  if  both  the  UAV's  have  only  one  link;  it  is  500kbps  if  at  least  one  UAV 
has  2  links  and  so  on.  The  bandwidth  of  the  link  also  depends  on  the  length  of  the  link  (Distance 
between  UAV's).  The  maximum  bandwidth  is  1Mbps  till  a  distance  of  30  miles  and  bandwidth  decreases 
to  180  kbps  as  the  distance  increases  to  60  miles. 

UAV  "0"  acts  as  the  Broker  node  or  Root  node.  UAV's  which  have  a  direct  link  to  UAV  "0"  are  called 
Level  1  nodes.  The  remaining  UAV's  are  level  2  nodes  or  normal  nodes. 


8.1.2  Links  in  the  Network- 

Each  UAV  tries  to  create  a  link  with  all  other  UAV's  within  its  area.  The  bandwidth  then  depends  on 
the  number  of  links  present  at  both  the  nodes.  Since  UAV  '0'  is  a  root  node,  we  allow  only  3  connections 
for  UAV  'O'.  All  other  UAV's  can  have  any  no.  of  links. 
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8.1.3  Movement  of  UAV's 

The  speed  of  all  the  UAV's  is  set  to  100  nautical  miles/hour. 

Random  Way  point  Model:  UAV's  move  around  within  the  area  randomly.  Random  Way  Point  model  has 
been  implemented  for  the  movement  of  UAV's.  As  the  UAV's  move,  bandwidth  of  the  links  change 
based  on  the  distance  between  them.  Also  links  are  broken  and  new  links  formed  based  on  the  distance. 
UAV's  may  not  move  always.  Sometimes  they  stay  at  fixed  position  and  this  also  has  been  simulated. 
Fixed  Trajectory:  Each  UAV  is  assigned  a  fixed  path  when  it  is  created.  UAV's  can  be  asked  to  move  in 
this  path.  Fixed  path  trajectory  has  been  implemented  for  it.  The  speed  for  this  fixed  trajectory  is  also 
the  same.  UAV's  either  move  randomly  or  along  the  path  fixed.  Both  of  them  are  not  possible  at 
the  same  time. 

Circular  movement  of  UAVs:  On  addition  to  the  fixed  path  trajectory  having  a  polygon  of  any  random 
shape,  we  have  incorporated  the  circular  fixed  path  trajectory  also.  The  current  system  has  both  circular 
path  fixed  trajectory  and  polygon  shaped  fixed  path  trajectory.  The  trajectory  can  be  viewed  in  the 
simulation  environment  by  entering  the  node  label  in  the  text  box  beside  show  path  and  clicking  show 
path. 

8.1.4  Routing  Algorithm 

Since  the  number  of  UAV's  are  less,  a  two  layer  routing  protocol  has  been  implemented.  For  this 
each  UAV  broadcasts  information  about  its  neighbors  and  the  distance  between  them  (i.e.  each  UAV 
broadcasts  its  neighbor  table).  This  is  done  by  sending  an  ALIVE  message  periodically. 

Using  all  the  received  ALIVE  messages,  each  UAV  has  an  overview  of  the  graph  (length  of  edges  and 
UAV's)  in  the  first  layer.  The  graph  is  stored  in  the  memory  of  the  UAV  and  is  updated  as  soon  as  ALIVE 
messages  are  received.  So  each  UAV  has  the  most  recent  possible  graph  structure. 

To  also  reduce  the  time  taken  to  find  a  route  between  UAV's,  each  UAV  finds  a  route  to  all  other 
UAV's  and  stores  it  in  the  second  layer.  These  routes  are  found  from  the  graph  in  the  first  layer  using 
Dijkstra's  algorithm. 

Whenever  an  existing  link  is  broken,  the  UAV's  involved  send  an  ALIVE  message  again.  This  is  done 
to  update  the  graph  structure  at  every  UAV.  UAV's  can  then  decide  whether  to  find  a  route  again  or  to 
use  an  existing  one. 

When  a  new  link  is  formed,  the  UAV's  do  not  broadcast  data  about  the  new  link.  This  is  because  as 
soon  as  the  time  period  expires  each  UAV  broadcasts  an  ALIVE  message  again,  so  the  new  link 
information  is  also  broadcasted.  Since  we  are  not  sure  if  this  link  will  help  decrease  the  cost  incurred  by 
a  considerable  level,  we  wait  for  the  timer  to  expire  rather  than  broadcasting  about  the  new  link  again. 

8.1.5  Orientation 

We  have  each  UAV  rotating  at  four  specific  angles  (0,  \\/2,  TL  2TT)-  We  assume  that  each  UAV 
will  be  rotating  to  any  one  of  these  angles  for  every  one  minute.  We  then  reduce  the  initial  available 
bandwidth  based  on  distance  by  subtracting  it  with  a  reduction  factor.  The  reduction  factor  depends  on 
the  orientation  of  the  UAV's,  there  are  16  such  cases  for  four  angles  as  discussed  in  the  Table  21  below. 
We  are  categorizing  UAV's  as  UAVland  UAV2  based  on  their  location  in  the  simulation  area. 
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Let  UAV  (v),  UAV  (w)  be  any  two  UAV's. 

Case  1:  If  (X  axis  value  of  UAV  (v)  >  X  axis  value  of  UAV  (w))  UAV1  =  UAV  (w)  &  UAV2=  UAV  (v) 

Else  UAV1  =  UAV  (v)  &  UAV2=  UAV  (w);  Shown  in  Figure  15. 

Case  2:  If  (Y  axis  value  of  UAV  (v)  >  Y  axis  value  of  UAV  (w))  UAV1  =  UAV  (w)  &  UAV2=  UAV  (v) 

Else  UAV1  =  UAV  (v)  &UAV2=  UAV  (w);  Shown  in  Figure  16. 

The  ideology  of  using  case  1  is  to  determine  the  left  and  right  UAV's  in  the  simulation  area.  Casel 
ensures  that  UAV1  will  be  always  to  the  left  of  UAV2. 


Figure  15  Relative  positions  of  UAVs:  Case  1 

There  are  cases  in  which  the  X  axis  of  the  UAV  might  be  the  same,  i.e.)  they  might  be  lying  in  the  same 
vertical  axis.  Flence,  we  have  case  2  determining  top  and  bottom  UAV's  in  the  simulation  area.  Case  2 
ensures  that  UAV1  is  always  to  the  bottom  of  UAV2. 


Figure  16  Relative  positions  of  UAVs:  Case  2 
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Table  21  Reduction  factors  for  the  two  UAVs 


NO 

UAV1 

UAV2 

REDUCTION  FACTOR 

Case  1 

REDUCTION  FACTOR 

Case  2 

1 

0 

0 

5k 

5k 

2 

TT/2 

0 

6k 

3k 

3 

TT 

0 

7k 

4k 

4 

3TT/2 

0 

6k 

6k 

5 

0 

TT/2 

3k 

6k 

6 

TT/2 

TT/2 

5k 

5k 

7 

TT 

TT/2 

6k 

6k 

8 

3TT/2 

TT/2 

4k 

7k 

9 

0 

TT 

0 

4k 

10 

TT/2 

TT 

3k 

3k 

11 

TT 

TT 

5k 

5k 

12 

3TT/2 

TT 

3k 

6k 

13 

0 

3TT/2 

3k 

3k 

14 

TT/2 

3TT/2 

4k 

0 

15 

TT 

3TT/2 

6k 

3k 

16 

3TT/2 

3TT/2 

5k 

5k 

Constant  k=  6 

Bandwidth  after  orientation=Original  Bandwidth  -  Reduction  Factor 

The  reduction  factor  is  designed  by  assuming  the  antenna  to  be  in  front  of  the  UAV's.  For  e.g.,  in  row 
number  9  of  case  1,  UAV1  is  0  degree  and  UAV2  is  180  degree,  since  both  UAV's  are  facing  each  other  it 
will  get  the  maximum  bandwidth,  as  shown  below  in  Figure  17. 


Figure  17  UAVs  facing  each  other 
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Whereas,  if  we  consider  row  number  3  of  case  1,  UAV1  is  180  degree  and  UAV2  is  at  0  degree;  both 
UAV's  are  facing  opposite  to  each  other  as  shown  below,  hence  this  event  bears  the  highest  reduction 
factor. 


Figure  18  UAVs  facing  in  opposite  directions 


8.2  NETWORK  CONFIGURATION  &  MESSAGE  PROTOCOL 

8.2.1  ASSUMPTION 

All  UAV's  (also  called  as  nodes  here)  are  homogenous  and  they  each  carry  a  device  that  has 
capabilities  similar  to  a  laptop  and  that  has  support  for  wireless  communication.  The  number  of  nodes 
ranges  from  10-16.  The  message  format  for  the  various  messages  exchanged  is  similar  and  it  contains 
multiple  flags  to  indicate  the  type  of  data  being  sent  in  the  payload.  The  address  of  each  UAV  is  of  length 
5  bits  and  it  is  unique  to  each  UAV. 


Msg.  Id 

D 

A 

L 

M 

S 

Source  ID 

Destination  ID 

Timestamp 

Payload 

D-Discover 

A-Alive 

L-Link 

M-Message/Data 

S-Status 

Spare  Flags,  reserved  for  future  use 
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8.2.2  SETUP 

Initially  there  is  a  single  node  that  exists  in  the  network. 

At  say  time  T=tl  there  is  a  node  which  comes  into  the  range.  The  new  node  sends  a  neighbor  discovery 
DISCOVER  message.  The  DISCOVER  message  contains 

•  The  source  identifier  (ID). 

•  The  destination  ID  is  typically  a  Broadcast  address. 

•  A  Timestamp. 

•  A  list  of  data  it  contains. 

•  The  Flag  D  when  set  indicates  that  the  payload  contains  information  about  the  data  on  that 
node. 

If  the  node  is  the  only  one  in  the  network  for  the  current  time  slot,  it  sends  up  to  'N'  neighbor  DISCOVER 
messages  and  decides  that  it  is  the  only  one  in  the  network.  The  DISCOVER  messages  are  sent 
periodically  to  gain  new  connections.  Say  at  time  T=t2,  there  are  other  nodes  which  come  into  the 
range.  Each  of  these  nodes  send  a  similar  DISCOVER  message.  When  an  already  existing  node  receives  a 
DISCOVER  message,  if  the  node  is  within  a  distance  of  40  miles  and  if  it  has  free  channels  available,  it 
responds  with  an  acknowledgement.  A  node  does  this  by  sending  an  ALIVE  message.  The  ALIVE  message 
contains: 

•  The  source  identifier  (ID). 

•  The  destination  ID  (which  has  previously  sent  the  DISCOVER  message). 

•  The  connectivity  graph  /  Adjacency  matrix. 

•  The  Flag  A  when  set  indicates  that  the  payload  contains  information  about  the  nodes 
connectivity. 

If  all  the  nodes  in  the  network  have  3  connections  (i.e.  if  all  the  channels  are  taken  up),  one  node  is 
chosen  to  form  a  link.  This  node  is  found  from  the  set  of  nodes  within  a  distance  of  30  miles  from  the 
node  sending  DISCOVER  message.  Nodes  farther  than  30  miles  do  not  reply  to  the  DISCOVER  message. 
The  selected  node  breaks  an  existing  link  by  sending  a  LINK  message.  It  then  replies  with  an  ALIVE 
message  to  the  node  sending  DISCOVER  message.  The  LINK  message  contains: 

•  The  source  identifier  (ID). 

•  The  destination  ID  (the  node  to  which  link  is  broken) 

•  A  Timestamp 

•  Flag  Lis  set. 

If  a  node  is  many  hops  away  from  any  node,  LINK  message  can  also  be  used  to  break  an  existing  link  and 
create  a  new  link.  This  is  done  so  that  a  least  used  link  is  broken  and  a  link  is  created  between  nodes 
which  are  many  hops  away.  The  least  used  link  is  found  from  the  routing  table.  Weight  is  assigned  to 
each  link  based  on  the  number  of  nodes  it  can  reach  using  that  link.  The  link  with  least  weight  is  broken 
to  create  a  new  link. 

If  the  node  receiving  LINK  message  has  a  link  with  the  source,  that  link  is  broken.  If  the  node 
receiving  LINK  message  does  not  have  any  link  with  source  and  if  the  nodes  are  within  a  distance  of  30 
miles,  a  new  link  is  created  with  the  source.  A  new  link  is  acknowledged  by  sending  an  ALIVE  message 
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back.  After  this  initial  setup  phase  each  node  in  the  network  has  information  about  both  the 
connectivity  as  well  as  the  data  that  each  of  its  neighbors  nodes  possess.  The  connectivity  information 
helps  in  routing.  Routing  is  required  for  those  nodes  which  want  to  send  data  to  other  nodes  which  are 
not  directly  connected. 


8.2.3  CONSTRUCTION  OF  GN 

Each  node  broadcasts  a  ALIVE  message  periodically.  This  ALIVE  message  also  contains  the 
neighbor  table  like  structure  for  each  node.  This  table  contains  the  nodes  it  can  connect  to  and  the 
distance  to  each  of  them. 


Figure  19  Graph  representing  4  nodes 


From  Figure  19,  information  broadcasted  by  node  1  is: 

1  0  Distance  between  0  and  1 
1  2  Distance  between  1  and  2 
1  3  Distance  between  1  and  3 

Each  node  broadcasts  such  information  to  the  whole  network  using  the  links.  An  adjacency  matrix  with 
the  distance  is  created  at  every  node  from  this  information.  This  adjacency  matrix  is  used  to  find  the 
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shortest  route  to  the  nodes  in  the  network.  The  shortest  route  is  found  using  Dijkstra's  algorithm  on  the 
adjacency  matrix.  At  any  point  of  time,  there  are  18  links  in  the  network  with  12  nodes.  So  the  amount 
of  link  information  sent  is  18  lines.  The  routing  table  thus  formed  is  used  to  find  the  least  used  link  and 
also  to  check  the  connectivity  in  the  network.  The  routing  table  for  a  network  is  stored  in  the  following 
format  in  Table  22. 


Table  22  Routing  table 

Routing  Table  for  2  : 

2  47  0 
2  11  1 

243 
2  4 

2475 
2  6 
247 
2  43  8 
2  11  9 
2  47  10 
2  11 


Row  0  shows  the  shortest  route  from  node  2  to  node  0. 

Row  1  shows  the  shortest  route  from  node  2  to  node  1  and  so  on. 

If  any  row  is  empty,  then  the  node  is  not  able  to  reach  that  node.  It  tries  to  connect  to  that  node  by 
sending  LINK  message  to  that  node. 

If  the  no.  of  hops  to  any  node  is  more,  (for  e.  g.,  if  the  10th  row  is  2  4  7  0  1  10—  six  hops)  node  2  tries  to 
establish  a  direct  connection  with  node  10  if  it  is  within  reachable  distance.  The  link  to  be  broken  is 
decided  based  on  the  weights.  The  weights  of  the  links  are: 

Link  2-4  —7 
Link  2-6  —  1 
Link  2-11— 3 

Since  the  link  2-6  has  less  weight,  this  link  is  broken  to  form  a  new  link  with  10. 

Each  node  to  indicate  its  presence  in  the  network  sends  periodic  ALIVE  message  to  those  who  are 
connected  to  it.  This  message  is  essentially  a  broadcast  message.  This  helps  in  maintaining  the  updated 
distances  and  bandwidth  of  the  links.  With  this  algorithm,  information  about  all  the  links  is  known  to  all 
the  nodes  in  the  network.  This  helps  in  finding  the  best  shortest  path  in  the  network.  In  the  previous 
algorithm,  the  information  is  only  shared  with  the  neighbors.  The  neighbors  then  share  this  information 
in  the  next  cycle.  This  increases  the  time  to  update  the  table  at  the  other  end  of  the  network.  This 
network  can  also  change  by  the  time  information  reaches  other  end.  Using  this  mechanism,  the  updated 
information  is  known  to  all  the  nodes  and  helps  in  finding  the  best  route  every  time. 
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8.2.4  RECONSTRUCTION  OF  GN 


If  no  ALIVE  message  is  received  from  a  node,  that  node  is  considered  to  be  disconnected.  The 
neighboring  nodes  then  broadcast  the  routing  table  again  and  also  broadcast  a  DISCOVER  message. 
DISCOVER  message  is  broadcasted  because  there  is  free  channel  available. 

Whenever  an  ALIVE  message  is  received  by  any  node,  the  distance  between  the  nodes  is 
updated  and  the  shortest  route  to  every  node  is  found  again. 


8.2.5  GRAPH  MAINTENANCE 

Graph  structure  of  the  network  is  maintained  by  storing  the  distance  between  the  nodes.  This 
distance  is  updated  whenever  a  ALIVE  message  is  received.  The  ALIVE  message  is  broadcasted  under 
these  conditions: 

1.  When  acknowledging  a  new  link 

2.  Periodically  to  update  the  distance  between  nodes. 

3.  Whenever  a  link  is  deleted  from  the  network. 

8.3  METADATA  GENERATION 

The  metadata  information  is  stored  at  each  node.  Every  node  will  broadcast  its  MIO 
information.  Every  other  node  computes  the  metadata  information  for  data  present  at  other  nodes 
using  the  received  MIO  information.  The  metadata  information  metrics  are  cardinality,  Width,  Min  value 
and  Max  value.  These  metrics  are  used  for  computing  cost  during  query  plan  generation.  The  metadata 
table  with  metadata  information  metrics  is  shown  in  Table  23. 


We  are  using  Triggers  to  update  the  metadata  metrics,  when  new  data  are  inserted  into  the  MIO  table. 


Object 

Node  ID 

Size 

Object 

Latitude 

Longitude 

Time 

Received 

Type 

Description 

Stamp 

Time  Stamp 

MIO  TABLE  ATTRIBUTES 

Table23:  METADATA  TABLE 


Name 

Type 

card 

width 

Min  value 

Max  value 

Range 

(unique  values) 

Object  Type 

Rel 

Size 

number 

Object  Description 

Varchar 

Latitude 

number 

Longitude 

number 

Time  Stamp 

Varchar 
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8.4  REPLICATION 


There  exists  only  one  replica  of  any  data  item  prevailing  in  the  system.  We  have  the 
source  node  containing  the  original  data  item  and  one  of  the  candidate  nodes  containing  the  replica. 
The  candidate  nodes  are  one  hop  neighbors  of  the  source  node;  hence  replication  is  restricted  to  one 
hop.  Each  node  will  have  many  candidate  nodes;  hence  the  choice  of  replicating  a  data  item  at  a 
candidate  node  is  based  on  a  cost  function  comprising  Bandwidth,  Link  Stability  and  Degree  of  Node. 

Wb,W|,Wd  Weights  assigned  for  bandwidth,  link  stability  and  degree  of  node. 

CostSjC=  l(Wb*average  bandwidth  +  W*  link  stability  +  Wd  *  average  degree) 

Once  a  node  replicates  its  data  to  one  of  its  neighbors,  it  places  the  replica  information  in  the  replication 
mapping  table.  The  replication  mapping  table  will  be  maintained  at  each  node.  The  replication  mapping 
table  contains  information  about  name  of  the  data  item  being  replicated,  original  node  and  the 
replicated  node. 

8.5  STATIC  QUERY  PLAN  EXECUTION 

On  an  environment  at  which  partial  meta-data  is  broadcasted,  i.e.  Meta  data  containing  only 
data_desc  and  source  information,  we  have  static  query  plans  to  evaluate  queries  requiring  other  meta¬ 
data  information  such  as  latitude,  longitude  etc.  A  query  plan  is  a  numbered  sequence  of  steps  that  can 
be  easily  interpreted  and  executed  at  any  node  as  shown  below. 


Operation  1 

Param 

Operandl 

Operandl 

Loc 

Operand2 

Operand  2 

Location 

Result 

Name 

Result 

Loc 

Operation  2 

Param 

Operandl 

Operandl 

Loc 

Operandl 

Operand2 

Loc 

Result 

Name 

Result 

Loc 

Operation  n 

Param 

Operandl 

Operandl 

Loc 

Operandl 

Operand2 

Loc 

Result 

Name 

Result 

Loc 

The  query  execution  proceeds  as  follows.  The  plan  is  sent  to  the  node  in  which  the  first  operation  takes 
place.  The  interpreter  in  that  node  uses  the  plan  counter  to  execute  as  many  steps  as  possible  in  that 
node.  When  a  move  is  encountered,  it  sends  the  meta-data  results  as  well  as  the  plan  to  the  node 
mentioned  in  Result  Loc  of  the  plan.  This  process  continues  until  the  last  step  of  the  plan  is  executed. 
The  final  node  containing  meta-data  result  retrieves  the  needed  Data  either  from  replica  or  original 
node. 
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We  have  the  following  set  of  operations  SELECT,  MOVE  and  JOIN;  Param  indicates  the  condition  for 
select  and  join  operations.  Operandl  &  Operand2  are  variables  used  for  join  operation.  The  result  of 
select  and  join  operation  are  placed  in  the  Result  Name  variable. 

"SELECT  *  FROM  UAV1  Rh  UAV2  R*  UAV3  R3  WHERE  Rh timestamp  LIKE  k  AND  R2.timestamp  LIKE  j  AND 
R3.Obj_type='l'  AND  R1.latitude>R2.latitude  AND  R2.longitude>R3.longitude" 
k,  j  <-  Input  timestamps  given  by  the  user. 

The  query  plan  for  the  above  query  is  shown  in  the  table  24.  The  query  plan  is  given  to  node  0  by 
default,  which  follows  execution  as  per  the  plan. 


Table  24:  Query  plan 


Operation 

Sub  SQL  &  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  LATITUDE 

FROM  UAV1  WHERE 
TIME_STAMP  LIKE  k 

Ri 

1 

Null 

Null 

Ri 

1 

Move 

Null 

Ri 

1 

Null 

Null 

Ri  2 

2 

Select 

SELECT 

LATITUDE, LONGITU 

DE  FROM  UAV2 

WHERE 

TIME_STAMP  LIKE  j 

r2 

2 

Null 

Null 

r2 

2 

Join 

Latitude  >  Latitude 

Ri  2 

2 

r2 

2 

Ji 

2 

Move 

Null 

Ji 

2 

Null 

Null 

J 12 

3 

Select 

SELECT  *  FROM 

UAV3  WHERE 

OBJ_TYPE=1 

r3 

3 

Null 

Null 

r3 

3 

Join 

Longitude  > 

Longitude 

r3 

3 

J 12 

3 

J2 

3 

Move 

Null 

J2 

3 

Null 

Null 

J37 

7 

We  also  simulated  query  plans  for  the  following  scenarios. 

1.  Get  all  information  from  UAV  taken  within  last  five  minutes  of  the  area  bounded  by  latitudel, 
longitudel  and  Iatitude2,  longitude2. 
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Query: 

SELECT  *  FROM  UAV1  Rh  UAV2  R2  WHERE  Rptimestamp  >time  AND  R2.timestamp  <  time  AND 
Rptimestamp  >  time  AND  R2.timestamp  <  time  AND  R1.latitude>R2.latitude  AND 
Rp  long  itude>R2.  longitude 


Query  Plan: 


Operation 

Sub  SQL&  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM  UAV1  R1 
WHERE  R1.time_stamp 

>to_date('08-dec-2010 

09:41:30 

PM','dd/mm/yyyyhh:mi:ss  PM1) 
AND  R1.time_stamp  < 
to_date('08-dec-2010  09:41:32 
PM','dd/mm/yyyyhh:mi:ss  PM1) 

Ri 

1 

Null 

Null 

R11 

1 

Move 

Null 

Ru 

1 

Null 

Null 

R11 

2 

Select 

SELECT  *  FROM  UAV2  R2 
WHERE  R2.time_stamp 

>to_date('08-dec-2010 

09:41:30 

PM','dd/mm/yyyyhh:mi:ss  PM1) 
AND  R2.time_stamp  < 
to_date('08-dec-2010  09:41:32 
PM','dd/mm/yyyyhh:mi:ss  PM1) 

r2 

2 

Null 

Null 

R21 

2 

Join 

Latitude>  Latitude  and 

Longitude  >  Longitude 

Ru 

2 

R21 

2 

Ji 

2 

Move 

Null 

Ji 

2 

Null 

Null 

Ji 

3 

The  date  given  in  the  query  and  query  plan  are  sample  dates,  our  simulation  run  with  respect  to  current 
time.  It  retrieves  data  taken  by  UAVs  since  last  5  minutes. 

2.  Get  all  images  taken  within  last  five  minutes  of  the  area  bounded  by  latitudel,  longitudel  and 
Iatitude2,  longitude2  within  2  NM,  3NM. 
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Query: 

SELECT  *  FROM  UAV1  Rh  UAV2  R2  WHERE  R  ^timestamp  >11/3/2010  AND  R2.timestamp  >11-3-2010 
AND  Rptimestamp  <12/1/2010  AND  R2.timestamp  >12-1-2010  AND  R1.latitude>2  AND  R1.longitude>2 
and  R2.longitude>3  AND  R2.longitude>3  and  R1.0BJ_TYPE=1  AND  R2.0BJ_TYPE=1 


Query  Plan: 


Operation 

Sub  SQL&  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM  UAV1  R1 
WHERE  R1.time_stamp 

>to  date('08-dec-2010 

09:41:30 

PM','dd/mm/yyyyhh:mi:ss  PM') 
AND  Rl.time  stamp  < 

to_date('08-dec-2010  09:41:32 
PM','dd/mm/yyyyhh:mi:ss  PM') 
And  R1.latitude>2  AND 
R1.longitude>3 

And 

Rl.obj  type=1 

Ri 

1 

Null 

Null 

R11 

1 

Move 

Null 

Rn 

1 

Null 

Null 

R11 

2 

Select 

SELECT  *  FROM  UAV2  R2 
WHERE  R2.time_stamp 

>to  date('08-dec-2010 

09:41:30 

PM','dd/mm/yyyyhh:mi:ss  PM') 
AND  R2.time  stamp  < 
to_date('08-dec-2010  09:41:32 
PM','dd/mm/yyyyhh:mi:ss  PM') 
And  R2.latitude>2  AND 
R2.longitude>3 

And 

R2.obj_type=1 

r2 

2 

Null 

Null 

R21 

2 

UNION 

Rn 

2 

R21 

2 

Ui 

2 

Move 

Null 

Ui 

2 

Null 

Null 

Ui 

3 
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3.  Get  all  "Data"  locations  within  12  NM  of  the  area  bounded  by  latitudel,  longitudel  and  Iatitude2, 
longitude2. 


Query: 

SELECT  UAV1. latitude,  UAV2. latitude,  UAV1. longitude,  UAV2.  Longitude  FROM  UAV1  Rh  UAV2  R2 
WHERE  R1.latitude>12  AND  R2.latitude>  12  AND  R1.longitude>12  AND  R2.longitude>12  AND 
R1.0BJ_DESC="Data"  and  R2.0BJ_DESC="Data" 


Query  Plan: 


Operation 

Sub  SQL&  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM 

UAV1  R1  WHERE 
R1.latitude>12  AND 
R1.longitude>12 

And 

R1.0BJ  DESC=”Data” 

Ri 

1 

Null 

Null 

R11 

1 

Move 

Null 

Ru 

1 

Null 

Null 

R11 

2 

Select 

SELECT  *  FROM  UAV2 
R2  WHERE 

R2.latitude>12  AND 
R2.longitude>12 

And 

R2.0BJ_DESC=”Data” 

r2 

2 

Null 

Null 

R21 

2 

Union 

Ru 

2 

R21 

2 

Ui 

2 

Project 

Latitude,  Longitude 

Ui 

2 

Null 

Null 

Pi 

2 

Move 

Pi 

2 

Pi 

3 
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4.  Get  all  "Image"  locations  within  2  NM  of  the  area  bounded  by  latitude!.,  longitudel  and  Iatitude2, 
longitude2. 


Query: 

SELECT  UAV1. latitude,  UAV2. latitude,  UAV1. longitude,  UAV2.  Longitude  FROM  UAV1  Rh  UAV2  R2 
WHERE  R1.latitude>2  AND  R2.latitude>  2  AND  R1.longitude>2  AND  R2.longitude>2  AND 
R1.0BJ_DESC="lmage"  and  R2.0BJ_DESC="!mage" 


Query  Plan: 


Operation 

Sub  SQL&  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM  UAV1 
R1  WHERE 

R1.latitude>2  AND 

R1.longitude>2 

And 

R1.0BJ  DESC-’lmage” 

Ri 

1 

Null 

Null 

R11 

1 

Move 

Null 

Ru 

1 

Null 

Null 

R11 

2 

Select 

SELECT  *  FROM  UAV2 
R2  WHERE 

R2.latitude>2  AND 

R2.longitude>2 

And 

R2.0BJ_DESC=”lmage” 

r2 

2 

Null 

Null 

R21 

2 

Union 

Ru 

2 

R21 

2 

Ui 

2 

Project 

Latitude,  Longitude 

Ui 

2 

Null 

Null 

Pi 

2 

Move 

Pi 

2 

Pi 

3 
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5.  Get  the  latitude,  longitude,  and  elevation  for  any  object  of  type  rocket  launcher  in  the  area  bounded 
by  latitudel,  longitude!  and  Iatitude2,  longitude2. 


Query: 

SELECT  UAV1. latitude,  UAV2. latitude,  UAV1. longitude,  UAV2.  Longitude  FROM  UAV1  Rh  UAV2  R2 
WHERE  R1.latitude>R2.latitude  AND  R1.longitude>R2.longitude 


Query  Plan: 


Operation 

Sub  SQL&  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM  UAV1 

R1 

Ri 

1 

Null 

Null 

R11 

1 

Move 

Null 

Rii 

1 

Null 

Null 

R11 

2 

Select 

SELECT  *  FROM  UAV2 

R2 

r2 

2 

Null 

Null 

R21 

2 

Join 

Latitude>  Latitude  and 
Longitude  >  Longitude 

Rii 

2 

R21 

2 

Ji 

2 

Project 

Latitude,  Longitude 

J1 

2 

Pi 

Move 

Null 

Pi 

2 

Null 

Null 

Pi 

3 
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6.  Get  the  current  ceiling  and  visibility  in  the  area  bounded  by  latitude!.,  longitudel  and  Iatitude2, 
longitude2. 


Query: 

SELECT  UAVl.obj_desc,  UAV2.obj_desc  FROM  UAV1  Rh  UAV2  R2  WHERE  R1.latitude>R2.latitude  AND 
Rt.  long  itude>R2.  longitude 


Query  Plan: 


Operation 

Sub  SQL&  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM  UAV1 
R1 

Ri 

1 

Null 

Null 

R11 

1 

Move 

Null 

Rn 

1 

Null 

Null 

R11 

2 

Select 

SELECT 

*  FROM  UAV2  R2 

r2 

2 

Null 

Null 

R21 

2 

Join 

Latitude>  Latitude  and 
Longitude  >  Longitude 

Rn 

2 

R21 

2 

Ji 

2 

Project 

OBJ_DESC 

J1 

2 

Pi 

Move 

Null 

Pi 

2 

Null 

Null 

Pi 

3 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


66 


7.  Get  all  forward  blue  force  positions  within  2  NM  of  the  area  bounded  by  latitudel,  longitudel  and 
Iatitude2,  longitude2. 

Query: 

SELECT  UAV1. latitude,  UAV2. latitude,  UAV1. longitude,  UAV2.  Longitude  FROM  UAV1  Rh  UAV2  R2 
WHERE  R1.latitude>R2.latitude  AND  R2.longitude>R2.longitude  AND  R1.0BJ_DESC="blue  force"  and 
R2.0BJ_DESC="b!ue  force"  AND  R1.latitude>2  AND  R2.latitude>  2  AND  R1.longitude>2  AND 
R2.longitude>2 


Query  Plan: 


Operation 

Sub  SQL&  JOINS 

Operandl 

Operandl 

Loc 

Operand  2 

Operand  2 
Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM  UAV1  R1 
WHERE 

R1.latitude>2  AND 

R1.longitude>2 

And 

R1.0BJ  DESC=”lmage” 

Ri 

1 

Null 

Null 

R11 

1 

Move 

Null 

Ru 

1 

Null 

Null 

R11 

2 

Select 

SELECT  *  FROM  UAV2  R2 
WHERE 

R2.latitude>2  AND 

R2.longitude>2 

And 

R2.0BJ  DESC=”lmage” 

r2 

2 

Null 

Null 

R21 

2 

UNION 

Ru 

2 

R21 

2 

Ui 

2 

Project 

Latitude,  Longitude 

U1 

2 

' 

' 

Pi 

Move 

Null 

Pi 

2 

Null 

Null 

Pi 

3 
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8.  A  sample  query  depicting  three  joins. 


Query: 

"SELECT  *  FROM  UAV1  Rh  UAV2  R2,  UAV3  Rx  UAVO  R0  WHERE  R2. timestamp  LIKE  k  AND  R2. timestamp 
LIKE  j  AND  R3.Obj_type='l'  AND  R1.latitude>R2.latitude  AND  R2.longitude>  R3.longitude  and 
R0.latitude>R3.latitude 


Query  Plan: 


Operation 

Sub  SQL  &  JOINS 

Operan 

dl 

Operandl 

Loc 

Operand  2 

Operand  2  Loc 

Result 

Name 

Result 

Loc 

Select 

SELECT  *  FROM 

UAV1  WHERE 

TIME_STAMP  LIKE  k 

R1 

1 

Null 

Null 

R11 

1 

Move 

Null 

R11 

1 

Null 

Null 

R11 

2 

Select 

SELECT  *  FROM 

UAV2  WHERE 

TIME_STAMP  LIKE  j 

R2 

2 

Null 

Null 

R21 

2 

Join 

Latitude  >  Latitude 

R11 

2 

R21 

2 

J1 

2 

Move 

Null 

J1 

2 

Null 

Null 

J1 

3 

Select 

SELECT  *  FROM 

UAV3  WHERE 

OBJ_TYPE=1 

R3 

3 

Null 

Null 

R31 

3 

Join 

Longitude  > 

Longitude 

J1 

3 

R31 

3 

J2 

3 

Move 

Null 

J2 

3 

Null 

Null 

J2 

0 

Select 

SELECT  *  from 

UAVO  WHERE 

OBJ_TYPE=1 

RO 

0 

Null 

Null 

R01 

0 

Join 

Latitude>Latitude 

J2 

0 

R01 

0 

J3 

0 
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8.6  SIMULATION  RESULTS 


Simulations  are  done  considering  a  network  of  12  UAVs  in  an  area  of  140  X  140  NM2.  Our  performance 
metrics  include  1)  Average  connectivity  &  Average  Reach-ability  for  routing,  2)  Average  roundtrip  time, 
Average  hop  count  and  Replica  hit%  for  Replication  and  3)  Average  Query  execution  time  for  analyzing  the 
behavior  of  move  and  join  operations  in  static  query  plans  with  respect  to  our  architecture. 

Simulation  Environment:  We  have  built  a  simulation  environment  in  JAVA  applet  and  have  used  Databases 
for  UAV  memory.  The  simulation  parameters  are  listed  as  follows 


Parameter 

Default  Value 

Range 

Simulation  Area 

140  X  140  NM2 

140X140 -245X245NM2 

Database  size  n 

50 

10-50 

Data  size 

1  MB 

Number  of  nodes 

12 

Bandwidth 

1Mbps 

Transmission  Range 

300  meters 

Velocity 

.8  NM/second 

.8  -  2.4  NM/second 

Query  generation  time 

1  second 

8.6.1  ROUTING 

We  did  the  following  experiments  to  evaluate  routing  with  respect  to  the  current  simulation 

system 

a)  Mobility  VS  Average  Reach-ability. 

b)  %  increase  in  simulation  area  VS  Average  Reach-ability 

c)  Time  period  of  alive  messages  VS  %  Path  Efficiency 

d)  Mobility  Vs  Average  Connectivity 

a)  Mobility  VS  Average  Reach-ability: 

The  first  experiment  is  to  find  the  average  number  of  UAV  reachable  in  the  network.  Reach¬ 
ability  is  an  important  performance  metric,  which  helps  to  analyze  the  success  of  UAV's  contacting  other 
UAV's  for  data.  The  availability  of  data  in  the  network  decreases  proportionally  with  decrease  in  average 
reach-ability.  We  compute  the  average  reach-ability  as  follows 

Average  Reach-ability  =  Sum  of  nodes  reachable  from  all  nodes  /  Total  no  of  nodes. 
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The  reach-ability  changes  with  respect  to  change  in  network  topology.  In  this  experiment  we  calculate 
reach-ability  at  various  time  intervals  with  a  different  network  topology  to  identify  at  most  reach-ability 
for  maximum  number  of  UAVs  (12)  at  various  speeds.  Every  interval  on  x-axis  constitutes  average  reach¬ 
ability  computed  every  20  seconds.  Thus,  this  experiment  is  done  for  100  different  network  topologies. 
On  average,  the  reach-ability  decreases  with  increase  in  speed,  with  maximum  reach-ability  of  9.8  is 
achieved  at  a  speed  of  .8  nautical  miles  /second  (equivalent  to  .92  miles/second). 


12 


messages 


The  95%  confidence  interval  for  each  samples  across  the  x  axis  are  shown  as  follows 


Speed\lnterval 

1 

2 

3 

4 

5 

.8  NM 

10.8±.4 

11±0 

1110 

1110 

9.81.6 

1.6  NM 

9±  1.2 

9.6  ±.6 

911.2 

811.8 

8.712.1 

2.4  NM 

7±2 

7.2±1.6 

6.711.8 

5.411.8 

7.311.8 

Speed\lnterval 

6 

7 

8 

9 

10 

.8  NM 

11±0 

9±1.1 

101.7 

710 

7.41.8 

1.6  NM 

6.6±2.1 

10.61.5 

9.711.7 

10.61.5 

9.811 

2.4  NM 

6±1.3 

711.7 

7.911.8 

712.0 

6.312.4 

b)  %  increase  in  simulation  area  VS  Average  Reach-ability 

We  evaluated  Average  Reach-ability  with  respect  to  simulation  area.  We  increased  the  simulation 
area  in  terms  of  25%,  50%  and  75%.We  have  initial  simulation  area  of  140X140  NM.  We  increased 
a)  25%=  175X175  NM 
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b)  50%  =210X210  NM 

c)  75%  =  245X245  NM 

The  Average  reach-ability  decreases  with  increase  in  area  size.  This  is  due  to  the  wider  placement  of 
UAVs  in  larger  areas,  which  causes  too  many  disconnections  forming  isolated  UAVs.  The  results  are 
plotted  and  graph  is  shown  as  follows 


The  95%  confidence  interval  for  samples  across  x  axis  are  shown  as  follows 


%  increase  in  simulation  area 

0 

25% 

50% 

75% 

111.0 

8.2±1.6 

7.211.9 

6.812.1 

c)  Time  period  of  Alive  messages  vs  %  Path  efficiency: 

In  this  experiment,  we  try  to  find  the  best  time-period  to  broadcast  ALIVE  messages.  ALIVE  messages 
contain  the  updated  link  information,  which  is  used  to  reconstruct  the  routing  table.  The  time  interval  in 
which  the  alive  messages  are  sent  should  be  optimal,  such  that  the  available  path  to  send  data  is  valid 
and  also  the  shortest  path  to  the  destination.  We  evaluate  the  %path  efficiency  for  different  movement 
models  sending  alive  messages  at  various  time  intervals  between  10  and  60  seconds  for  12  UAV's.  We 
determine  the  path  efficiency  of  the  routing  table  based  on  the  formula  below: 

%  Path  Efficiency=  (no.  of  valid  and  shortest  paths  in  routing  table/  Total  no.  of  paths)  *  100 
The  topology  change  varies  with  respect  to  speed,  hence  we  evaluate  with  respect  to  speed.  The  delay 
in  ALIVE  messages  increases  the  delay  in  updates,  which  causes  the  construction  of  erroneous  routing 
table;  hence  there  is  a  decrease  in  efficiency  with  increase  in  time  interval  of  alive  messages. 
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At  10  &  20  seconds  interval,  updates  are  fast,  hence  the  path  in  the  routing  table  are  almost 
valid  bringing  high  efficiency  irrespective  of  speed.  At  high  mobility  (1.6  &  2.2  miles/second),  network 
topology  changes  rapidly  and  updates  are  also  slow  at  50  &  60  seconds.  Hence,  the  path  efficiency  is 
very  less  as  compared  to  path  efficiency  at  .8  miles/second.  We  conclude  that  30  seconds  as  ideal  Alive 
message  broadcast  time  since  broadcast  at  reduced  time  interval  cause  too  much  congestion  in 
network.  Moreover,  efficiency  at  30  seconds  is  upright,  i.e.  the  difference  between  path  efficiencies  at 
various  speed  is  not  that  high. 

The  95%  confidence  intervals  of  the  plotted  samples  are  shown  as  follows 


Time  period  of  Alive  messages 

10 

20 

30 

40 

50 

60 

.8 

97.91.9 

9613.8 

95.313.4 

9415.9 

9315.8 

92.416 

1.6 

96.312.4 

9513.1 

94.113.6 

9114.4 

8716.4 

87.213 

2.4 

95.612.1 

94.812.1 

92.215.3 

9114.7 

8615.9 

85.615 

d)  Mobility  VS  Average  Connectivity: 

With  respect  to  connections,  we  are  restricting  the  connections  for  "UAV  0"  to  3;  the  remaining 
UAV's  can  have  as  many  connections  depending  on  the  numbers  of  UAV  lying  within  their  transmission 
range. 

Average  Connectivity=Total  no.  of  connections  for  each  node/Total  no.  of  nodes 

The  connectivity  changes  with  respect  to  change  in  network  topology.  In  this  experiment  we  calculate 
connectivity  at  various  time  intervals  with  a  different  network  topology  to  identify  at  most  connectivity 
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for  maximum  number  of  UAVs  (12)  at  various  speeds.  Every  interval  on  x-axis  constitutes  average 
connectivity  computed  every  20  seconds.  Thus,  this  experiment  is  done  for  100  different  network 
topologies. 


messages 


The  average  connectivity  decreases  with  increase  in  speed,  it  is  obvious  that  high  mobility  causes  too 
many  disconnections  decreasing  the  connectivity.  The  connectivity  at  0.8  NM/second  mobility  is 
maximum  with  an  average  of  3.98,  which  means  for  12  UAVs  every  UAV  has  degree  4  as  an  average  at 
,8NM/second. 

The  95%  confidence  interval  for  the  samples  are  shown  as  follows 


Speed\lnterval 

1 

2 

3 

4 

5 

.8  NM 

3.81.26 

3.81.26 

4+0 

4.21.26 

4+0 

1.6  NM 

2.8±  .39 

3.31.41 

3.71.83 

4.1  +  .20 

3.21.39 

2.4  NM 

2.71.29 

1.9+.20 

2.2+.40 

3.51.33 

2.81.26 

Speed\lnterval 

6 

7 

8 

9 

10 

.8  NM 

4.41.32 

4.1  +  .19 

3.41.32 

4.11.35 

4+0 

1.6  NM 

3.3+.41 

3.41.32 

3.91.35 

3.81.39 

2.91.73 

2.4  NM 

3.3+.71 

3.3+.41 

3.21.39 

3.71.65 

2.51.43 
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8.6.2  REPLICATION 


We  have  two  access  models  for  validating  the  replication  scheme. 

I)  Access  based  on  Load  balancing  Score  Function:(ABL) 

n 

ScoreN  =  [(SizeData  /  BWmin)  +  hopcount-1]  +  2  LWi  *  Pi 

i=0 

SizeData  Size  of  the  requested  data  item. 

BWmin<-  Minimum  bandwidth  across  the  path  between  source  to  destination. 

LW,  Load  Weight  at  node  i,  higher  the  load  causes  more  time  for  query  to  get 
Processed. 

P,  <rj  4  N  is  the  destination  (or)  IS 
'-l  N  is  Intermediate 

IS<-  refers  to  the  intermediate  data  source,  which  lies  at  a  shorter  distance  along  the  path  to 
another  data  source  for  the  same  data. 

With  respect  to  above  replication  scheme,  there  will  be  two  copies  for  any  data  item,  request  will  be 
send  to  UAV  having  a  lesser  ScoreN.  We  apply  random  load  weights  at  each  node;  our  load  weight  varies 
between  1  to  3.  We  have  kept  the  delay  due  to  single  load  as  10  milliseconds 
II)  Access  based  on  reduced  hop  count:  (ABR) 

Request  will  be  send  to  UAV,  which  can  be  reached  through  lesser  number  of  hops  from  source  UAV. 

We  did  the  following  set  of  experiments  to  validate  replication  scheme 
a  )  Number  of  data  items  VS  Average  hop  count, 
b  )  Number  of  data  items  VS  Average  roundtrip  time, 
c  )  Number  of  data  items  VS  Cache  hit  ratio. 

a)  Number  of  data  items  VS  Average  hop  count 

Average  hop  count  is  one  of  the  ideal  performance  metrics,  which  helps  to  analyze  the 
efficacy  of  data  access  due  to  replication. 

Average  hop  count=  Hi/  Nq 
Nq  number  of  queries 
H|  <-  hop  count  to  reach  the  destination 

We  validate  Average  hop  count  with  respect  to  the  two  access  models  (ABL  &  ABR).  We  measure 
the  average  hop  count  by  varying  the  number  of  data  items  from  10  to  50.  We  follow  ZIP-f  distribution 
based  query  access  with  K=15  and  a=0.8.  The  number  of  queries  increases  exponentially  with  increase  in 
number  of  data  items.  We  execute  the  average  hop  count  formula  for  each  case  and  the  results  are 
plotted  as  follows. 
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ABR  favors  hop  count;  hence  the  average  hop  count  is  less  compared  to  ABL  for  all  the  cases. 
ABL  favors  response  time.  Though  ABL  has  hop  count  in  its  equation,  distant  source  might  be  chosen 
based  on  reduced  load  across  the  path  and  lesser  estimated  response  time.  ABR  never  cares  about  load 
or  response  time  and  considers  only  hop  count.  Hence,  ABR  provides  better  hop  count  as  compared  to 
ABL. 

The  95  %  confidence  interval  for  samples  in  x  axis  is  as  follows 


No  of  Data  items 

10 

20 

30 

40 

50 

ABL 

1.74±.18 

1.7±.14 

1.7±.25 

1.83±.2 

1.9±.18 

ABR 

1.49±  .17 

1.39  ±.09 

1.48±.ll 

1.45±.09 

1.57±.15 

b)  Number  of  data  items  VS  Average  Round  trip  time 

Average  Roundtrip  time  is  also  one  another  performance  metric  determining  the  efficacy  of  data 
access.  This  performance  metric  too,  is  highly  dependent  on  the  access  models  (ABL  &  ABR). 

Average  Round  trip  time=  (1/Nq)  *  —  Trs ) 

Nq  <-  number  of  queries 

Tr  <-time  stamp  in  which  request  is  send. 

Trs  <-  time  stamp  in  which  the  first  initial  response  is  received. 

We  measure  the  average  round  trip  time  by  varying  the  number  of  data  items  from  10  to  50.  We 
follow  ZIP-f  distribution  based  query  access  with  K=15  and  a=0.8.  We  execute  the  average  roundtrip 
time  formula  for  each  case  and  the  results  are  plotted.  ABR  has  better  roundtrip  time  since  it  chooses 
data  from  nodes  at  shorter  distance;  the  first  part  of  ABL  score  function  chooses  source  from  which  data 
can  be  retrieved  at  a  shorter  response  time.  But  in  case  of  roundtrip  time,  we  consider  only  the  first 
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initial  response.  Since  ABL  has  increased  hop  count,  the  delay  due  to  load  across  the  path  is  high  as 
compared  to  ABR.  Since  average  hop  count  gives  preference  to  shorter  hops,  the  load  is  very  less  and 
the  delay  to  get  first  initial  response  is  less.  Thus,  the  average  roundtrip  time  of  ABR  is  less  as  compared 
to  ABL. 
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The  95%  confidence  interval  for  samples  is  as  follows 


No  of  Data  items 

10 

20 

30 

40 

50 

ABL 

171.4132.8 

174.2127 

172.2127 

183.3129.6 

182.8118.9 

ABR 

139.8123.1 

135.2  18 

154.4113 

141.3115.4 

151.6124.1 

c)  Number  of  data  items  Vs  Cache  hit  ratio: 

In  this  experiment,  we  try  to  find  effectiveness  of  the  replication  algorithm.  We  analyzed  the 
replica  hit  %,  by  varying  the  number  of  data  items  from  10  to  50.  The  formula  is  given  below: 

Replica  Data  hit  %  =  No  of  replica  hits/  Nq. 

Nq  <r  number  of  queries 
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ABR  give  preferences  to  shorter  hops  irrespective  of  replicas,  moreover  ABL  uses  bandwidth,  which 
is  used  by  the  replication  algorithm  too.  Hence  the  replica  hit  %  of  ABL  is  good  compared  to  ABR. 

The  95%  confidence  interval  for  samples  are  shown  as  follows 


No  of  Data  items 

10 

20 

30 

40 

50 

ABL 

45.17±3.34 

50.9±4.9 

41.8±4.15 

40.9±4.52 

40.98±6.24 

ABR 

32.46±15.6 

35.4  ±8.1 

30.4±7.6 

33.1±6.9 

27.55±4.67 

8.6.3  STATIC  QUERY  EXECUTION 

We  did  the  following  set  of  experiments  for  evaluating  replication  with  respect  to  the  current 
simulation  system. 

a)  Query  Execution  time  Vs  Bandwidth  shared  &  No  of  data  items 

b)  Query  Execution  time  Vs  Number  of  data  items  for  various  queries 

c)  Number  of  data  items  Vs  Number  of  joins 

a)  Number  of  data  items  Vs  Query  Execution  time: 

We  evaluated  the  query  execution  time  with  3  UAVs  for  query  plan  shown  in  table  1.3  by  increasing 
the  number  of  data  items  and  bandwidth  shared  by  them  from  the  broker.  The  query  execution  time 
increases  with  increase  in  number  of  data  items,  which  is  due  to  the  join  operation;  hence  the  query 
execution  time  for  750  records  is  less  than  1000  records.  Similarly,  the  query  execution  time  increases 
with  reduction  in  bandwidth  shared,  this  is  due  to  the  delay  incurred  by  the  move  operation  on 
transferring  the  query  results.  The  plotted  results  are  as  follows 
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The  95%  confidence  intervals  for  samples  are  as  follows 


Bandwidth  shared 

360 

300 

250 

750 

1384.8±72.8 

1448.41129.4 

1558.4+49.2 

1000 

1509.8±102.7 

1597123.1 

1686.6+46.6 

200 

150 

750 

1579.41147. 5 

1616.8173.5 

1000 

1702.2192.3 

1715.4+59.1 

b)  Query  Execution  time  Vs  Number  of  data  items  for  various  queries 

We  evaluated  the  query  execution  time  for  all  the  query  plans  in  section  (E)  by  varying  the 
number  of  data  items.  At  each  step,  we  increase  the  number  of  records  in  the  network  by  capturing 
more  data.  The  number  of  records  in  the  network  is  increased  from  500  to  1000  records  and  various 
queries  are  executed.  The  query  execution  time  is  found  and  the  results  of  the  above  experiment  are 
plotted  as  shown  below 
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Query  Execution  time  vs.  No.  of  records 


2500 


2000 


500  records  750  records  1000  records 

Total  no.  of  records  in  the  network 


■  Query  0 

■  Query  1 

■  Query  2 

■  Query  i 

■  Query  4 

■  Query  5 

■  Query  6 
Query  7 


The  query  execution  time  increases  as  the  number  of  records  in  the  network  increases.  This  is  because 
the  time  taken  to  join  tables,  combine  tables  and  transfer  data  between  UAV's  increases  with  the 
increase  in  total  number  of  records.  We  compare  the  execution  times  for  various  benchmark  queries 
and  compare  them  in  the  above  graph.  The  increase  due  to  number  of  records  is  not  large  because  only 
the  meta-data  is  transferred  between  UAV's. 


c)  Number  of  data  items  Vs  Number  of  joins 

We  evaluated  the  query  execution  time  for  various  numbers  of  joins  in  the  query;  the  following 
query  plans  in  Section  E  have  the  corresponding  number  of  joins 
Query  5<-  Single  Join 
Query  0<-Double  Join 
Query  8<~T riple  Join 

We  vary  the  number  of  records  for  each  set  and  the  results  are  plotted  as  follows 
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The  query  execution  time  increases  will  increase  with  respect  to  increase  in  number  of  joins  in  the 
query  plan.  Moreover,  the  increase  in  data  item  too  increases  the  query  execution  time,  due  to  more 
processing  time  while  joining  the  data  items. 

The  95%  confidence  interval  for  the  above  plotted  results  is  as  follows 


Bandwidth  shared 

Single 

Double 

Triple 

500 

842.4±98.14 

1263±73.3 

1616.6±91.5 

750 

935.6±51.9 

1321.2±103.7 

1735. 2±70.83 

1000 

999.6±40.3 

1479.4±72.4 

1988.2165.17 
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9.  SOFTWARE  DEVELOPMENT 

Two  pieces  of  software  have  been  developed  for  this  project: 


A  query  plan  generator  described  in  Section  3.  This  includes  the  software  for  generating  connectivity 
configurations.  Experimental  results  are  in  Section  3.6. 


A  simulator  described  in  Section  5.  Experimental  results  of  the  simulator  are  also  described  in  Section  5. 


All  the  software  developed  for  this  project  will  be  delivered  to  AFRL  on  a  DVD  or  CD  Rom.  They  can  also 
be  downloaded  by  right-clicking  and  downloading  on  the  following  (hidden)  links: 


http://itlab.uta.edu/downloads/QPGSETUP.zip 

http://itlab.uta.edu/downloads/SIMULATORSETUP.zip 

http://itlab.uta.edu/downloads/setup-instructions(qpg).zip 

http://itlab.uta.edu/downloads/setup-instructions(simulator).zip 
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10.  RESULTS  AND  DISCUSSIONS 


Collaborative  airborne,  ground/surface  based,  manned  and  unmanned  missions  are  expected  to  become 
more  prevalent  in  the  future. 

We  have  provided  an  insight  into  some  of  the  fundamental  issues  with  regard  to  scalable,  fault  tolerant 
repositories.  We  have  provided  a  use  case  analysis  that  allows  the  focusing  of  technological  needs  in  a 
complex  tactical  scenario. 

We  have  provided  insight  into  what  the  Metadata  management  should  look  like,  optimized  Query  plan 
generation  algorithms  and  heuristics  to  this  scenario  which  differs  from  traditional  distributed  query 
processing,  and  approaches  to  Replication  management.  We  have  provided  simulations  of  the  above 
that  allow  easier  visualization  and  understanding  of  some  of  these  complex  issues.  The  software 
developed  has  also  been  provided. 

This  research  has  enabled  an  understanding  of  the  issues,  and  has  raised  a  number  of  other  fruitful 
research  needs: 

Collaboration  of  teams  of  manned  and  unmanned  systems  will  require  a  different  type  of  command  and 
control  system.  This  next  generation  C2  system  must  work  at  a  number  of  different  levels.  It  will  need  to 
provide  planning,  strategic  control,  and  tactical  control.  It  will  need  to  understand  that  systems  and 
platforms  may  embark  on  missions,  without  access  to  communications,  and  then  rejoin. 

The  data  repositories  will  be  distributed,  highly  mobile,  and  dynamic.  The  ISR  requirements  dictate  that 
massive  amounts  of  data  will  need  to  be  collected  and  analyzed  autonomously  in  tactical  timeframes 

In  summary,  this  project  has  established  the  foundation  for  this  problem  by  identifying  the  sub 
problems,  issues  in  each  sub  problem,  and  has  investigated  how  they  all  come  together  towards  the 
solution  of  the  initial  problem.  This  effort  is  by  no  means  complete.  In  each  section,  we  have  articulated 
the  need  for  extending  proposed  solutions  to  reach  a  practically  useful  approach/solution. 

Furthermore,  we  have  not  addressed  all  the  problems  identified  in  the  proposed  architecture  shown  in 
Figure  1.  There  are  a  number  of  important  problems  that  need  to  be  addressed  to  obtain  a  complete 
solution.  We  will  be  happy  to  explore  opportunities  with  AFRL  or  other  agencies  to  continue  this  work. 

In  this  project,  we  have  implemented  proof-principle  systems  to  show  the  effectiveness  of  the  proposed 
approaches.  The  simulator  has  been  extended  to  incorporate  features  specific  to  this  scenario  and 
analysis  has  been  performed.  The  next  logical  step  is  to  create  a  prototype  by  bringing  all  the 
components  together  and  then  move  towards  a  testbed.  The  team  has  been  chosen  from  the  outset 
keeping  the  long  term  goals  in  mind  and  team  is  well-positioned  for  additional  work  on  this  project. 

11.  CONCLUSIONS 

The  project  involved  the  development  of  a  middleware  for  fault-tolerant  computing  in  a 
distributed  network  of  UAVs,  each  carrying  data  repositories.  Each  node  or  UAV  in  the  network 
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is  host  to  metadata,  sourced  at  the  node  as  well  as  replicated.  The  developed  middleware 
serves  the  main  purpose  of  responding  to  queries.  We  have  developed  methods  for  handling 
use  case  scenarios,  metadata  management,  replicated  data  management,  query  plan 
generation  algorithms,  and  network  management. 

Currently,  a  static  plan  is  generated  for  each  query  and  the  query  is  processed  sequentially  by 
performing  a  sequence  of  operations  at  different  nodes.  This  can  be  further  improved  by 
generating  a  dynamic  plan  at  each  node  where  the  query  is  processed  to  accommodate 
intermittent  connectivity  among  nodes.  Furthermore,  a  query  plan  can  be  executed  in  parallel 
(using  either  static  or  dynamic  plans)  to  improve  response  time.  This  can  be  extended  to  multi¬ 
copy  and  partial  replica  again  to  accommodate  intermittent  connectivity.  Above-mentioned 
extensions  will  add  complexity  to  both  Meta  data  management  and  query  processing. 

In  conclusion,  there  are  a  number  of  important  problems  that  need  to  be  addressed  to  obtain  a 
complete  solution.  The  team  will  be  happy  to  explore  opportunities  with  AFRL  or  other  agencies  to 
continue  this  work. 
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List  of  Acronyms 


AAA  -  Anti  Aircraft  Artillery 

ACK  -  Acknowledgement 

AWAC  -  Airborne  Warning  and  Control  System 

BER  -  Bit  Error  Rate 

CNF  -  Conjunctive  Normal  Form 

DMZ  -  De  Militarized  Zone 

FLOT  -  Forward  Line  Own  Troops 
ID  -  Identifier 

JAOC  -  Joint  Air  Operations  Center 
JTAC  -  Joint  Terminal  Attack  Controller 
LS  -  Link  Stability 

MIO  -  Metadata  Information  Object 

NAK-  Negative  Acknowledgement 

NM  -  Nautical  Mile 

RSS  -  Received  Signal  Strength 

SAM  -  Surface  to  Air  Missile 

SEAD  -  Suppression  of  Enemy  Air  Defenses  (SEAD) 

SOA  -  Service  Oriented  Architecture 

SPJ  -  Select-Project-Join 

SQL -Structured  Query  Language 

TCP -Transport  Control  protocol 

UAV  -  Unmanned  Aerial  Vehicles 

UDP  -  User  Datagram  protocol 
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